No Handler for command

Hi guys

I have a problem with connection between axonservers and one of our apps using kubernetes. We have 2 applications based on spring boot which communicate with one common axon server. Communication between first app (“A”) and axon server works as expected. Problem is with the second application (“B”). I Can see in the logs that application is subscribing to axon. There is no exceptions or warnings. On the other side axon server is reporting that “Error on connection from subscriber - CANCELLED: cancelled before receiving half close”. Axon dashboard shows only the first application.

On my local machine, using dockers everything works as expected:

  • each app connects to axon server
  • first app “A” (among other stuff) sends command “CopyFile” using CommandGateway
  • axon delivers command to only one instance of application “B”
  • “B” handles that correctly and when the work is done it sends back an event “FileCopied”
  • “A” handles events the process continues

When using kubernetes it goes like this:

  • each app connects to axon server. Logs of the apps looks goods. Axon server reports “Error on connection from subscriber - CANCELLED: cancelled before receiving half close”
  • application “A” sends command “CopyFile”
  • axon server reports “No Handler for command”
  • nothing happen in “B” application

Locally that architecture is working fine. When the “B” was deployed for the first time it was also working fine. Then after second deployment, we have a problem. I can login to each app via ssh and ping other machines, it doesn’t look like a network issue.
What have I missed? Any suggestions what can I check?

Thanks

Hi Rafal,

The message “Error on connection from subscriber - CANCELLED: cancelled before receiving half close”. is logged when AxonServer loses a connection with the client application, usually when a client stops. How does the client connect to axon server? Do you have any load balancers or ingres controllers defined in kubernetes?

Could you run axon server with the following logging setting (in axonserver.properties file):
logging.level.io.axoniq=DEBUG

What version of server and framework are you using?

Marc

Hello,

I’m working on the same project as Rafal.

We upgraded everything to 4.2. As of now, we have two applications connected to the same axon server and after running for some hours, they disconnect.

It feels to me its something related to Axon Server because it seems that after some time it does not allow connections from any application, and as soon as they disconnect (or I force disconnect by killing the pod and k8ts create a new one), it no longer is able to connect.

Logs from both (none visible from axon dashboard overview):

`

2019-10-23 22:15:50.887 WARN 8 — [end.listener]-0] o.a.e.TrackingEventProcessor : Error occurred. Starting retry mode.

org.axonframework.eventsourcing.eventstore.EventStoreException: Error while reading events from the server
at org.axonframework.axonserver.connector.event.axon.AxonServerEventStore$AxonIQEventStorageEngine$1.onCompleted(AxonServerEventStore.java:470) ~[axon-server-connector-4.2.jar!/:4.2]
at org.axonframework.axonserver.connector.event.AxonServerEventStoreClient$1.onCompleted(AxonServerEventStoreClient.java:172) ~[axon-server-connector-4.2.jar!/:4.2]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:440) ~[grpc-stub-1.23.0.jar!/:1.23.0]
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-api-1.23.0.jar!/:1.23.0]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-api-1.23.0.jar!/:1.23.0]
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-api-1.23.0.jar!/:1.23.0]
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-api-1.23.0.jar!/:1.23.0]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-api-1.23.0.jar!/:1.23.0]
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-api-1.23.0.jar!/:1.23.0]
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:510) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:66) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:630) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:518) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:692) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:681) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.23.0.jar!/:1.23.0]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[grpc-core-1.23.0.jar!/:1.23.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.lang.RuntimeException: Connection closed by server
… 22 more

`

and the other service

`

2019-10-23 22:18:53.108 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND SETTINGS: ack=false settings={=0, =0, =1048576, =8192}
2019-10-23 22:18:53.145 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041
2019-10-23 22:18:53.159 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND SETTINGS: ack=false settings={=2147483647, =1048576, =8192}
2019-10-23 22:18:53.161 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND SETTINGS: ack=true
2019-10-23 22:18:53.163 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041
2019-10-23 22:18:53.163 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND SETTINGS: ack=true
2019-10-23 22:18:53.213 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND HEADERS: streamId=3 headers=GrpcHttp2OutboundHeaders[:authority: axonserver:8124, :path: /io.axoniq.axonserver.grpc.control.PlatformService/GetPlatformServer, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.22.1, axoniq-context: default, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2019-10-23 22:18:53.227 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=67 bytes=000000003e0a1d6c617965722d6261636b656e642d636f70792d67656e65726174696f6e121d4c61796572204261636b656e6420436f70792047656e65726174…
2019-10-23 22:18:53.235 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[:status: 200, content-type: application/grpc, grpc-encoding: identity, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2019-10-23 22:18:53.240 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND DATA: streamId=3 padding=0 endStream=false length=39 bytes=00000000220a1e0a0a61786f6e73657276657210bc3f18d83e2a0a61786f6e7365727665721001
2019-10-23 22:18:53.244 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] streamDependency=0 weight=16 exclusive=false padding=0 endStream=true
2019-10-23 22:18:53.251 DEBUG 8 — [ main] o.a.a.c.AxonServerConnectionManager : Received PlatformInfo suggesting [axonserver] (axonserver:8124), reusing existing connection
2019-10-23 22:18:53.251 INFO 8 — [ main] o.a.a.c.AxonServerConnectionManager : Reusing existing channel
2019-10-23 22:18:53.251 DEBUG 8 — [ main] o.a.a.c.AxonServerConnectionManager : Start instruction stream to node [axonserver] for context [default]
2019-10-23 22:18:53.257 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND HEADERS: streamId=5 headers=GrpcHttp2OutboundHeaders[:authority: axonserver:8124, :path: /io.axoniq.axonserver.grpc.control.PlatformService/OpenStream, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.22.1, axoniq-context: default, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2019-10-23 22:18:53.260 INFO 8 — [ main] o.a.a.c.AxonServerConnectionManager : Re-subscribing commands and queries
2019-10-23 22:18:53.260 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND DATA: streamId=5 padding=0 endStream=false length=69 bytes=00000000400a3e0a1d6c617965722d6261636b656e642d636f70792d67656e65726174696f6e121d4c61796572204261636b656e6420436f70792047656e6572…
2019-10-23 22:18:53.269 INFO 8 — [ main] o.a.a.c.c.AxonServerCommandBus : Creating new command stream subscriber
2019-10-23 22:18:53.269 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND HEADERS: streamId=7 headers=GrpcHttp2OutboundHeaders[:authority: axonserver:8124, :path: /io.axoniq.axonserver.grpc.command.CommandService/OpenStream, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.22.1, axoniq-context: default, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2019-10-23 22:18:53.278 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND DATA: streamId=7 padding=0 endStream=false length=41 bytes=00000000241a22121d6c617965722d6261636b656e642d636f70792d67656e65726174696f6e188827
2019-10-23 22:18:53.281 DEBUG 8 — [ main] o.a.a.c.u.FlowControllingStreamObserver : Sending response to AxonServer platform, remaining permits: 2500
2019-10-23 22:18:53.281 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND DATA: streamId=7 padding=0 endStream=false length=151 bytes=00000000920a8f010a2466333266633261302d363832392d346433312d393261312d3964303364633063633930641229636f6d2e6c617965722e6261636b656e…
2019-10-23 22:19:03.249 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND PING: ack=false bytes=1111
2019-10-23 22:19:03.251 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND PING: ack=true bytes=1111
2019-10-23 22:19:13.251 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND PING: ack=false bytes=57005
2019-10-23 22:19:13.251 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND PING: ack=true bytes=57005
2019-10-23 22:19:23.252 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND PING: ack=false bytes=57005
2019-10-23 22:19:23.253 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND PING: ack=true bytes=57005
2019-10-23 22:19:23.873 INFO 8 — [nio-8080-exec-1] o.a.c.c.C.[.[.[/] : Initializing Spring DispatcherServlet ‘dispatcherServlet’
2019-10-23 22:19:33.254 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND PING: ack=false bytes=57005
2019-10-23 22:19:33.255 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND PING: ack=true bytes=57005
2019-10-23 22:19:43.256 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND PING: ack=false bytes=57005
2019-10-23 22:19:43.256 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND PING: ack=true bytes=57005
2019-10-23 22:19:53.258 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] INBOUND PING: ack=false bytes=57005
2019-10-23 22:19:53.258 DEBUG 8 — [-worker-ELG-1-2] i.g.n.NettyClientHandler : [id: 0xd5c136e6, L:/10.12.8.67:33654 - R:axonserver/10.12.8.65:8124] OUTBOUND PING: ack=true bytes=57005

`

Hi Fabio.
Thanks for the logs. Since the connection problem only seems to occur when running using kubernetes, it seems to be related to this set-up. Could you share the kubernetes configuration for Axon Server?

Marc

It’s basically the config available in the documentation

`

kind: Namespace
apiVersion: v1
metadata:
name: production
labels:
name: production

Hi Fábio, Rafal,
At the first inspection, I don’t see anything strange in the configuration. I know we have other users who are also using Axon Server in a Kubernetes environment and they have not reported issues like this before.

I will try to reproduce the issue locally and if it happens here I can further investigate it. I will let you know when I have more information.

Regards,
Marc

I really doubt it is kubernetes related

Could you elaborate on that?

twitter-icon_128x128.png

Hello Allard,

We have two applications connected to the same axon server. It seems that after some time they are not able to connect anymore. I just encountered the following situation:

  • I released a new version of one of the applications. It was not able to connect anymore (the other is still connected)
  • I delete the not application pod (so it gets rescheduled), as an effort to get it connecting again. It connects for a few seconds (can be seen in axon dashboard overview), but disconnects and no longer connect again.
  • I delete the pod of the application that is connected. It is also no longer able to connect.
  • I delete all the pods (Axonserver loses data), and then the applications are able to connect again
    Based on that, it really feels AxonServer entered a state that is not allowing further connections.

If I delete the pod of the other application,

Thanks everyone for answering.
Our application changed a little bit - some more heavy processing was added. Now I can reproduce the problem locally. When I start the app for the first time, everything works as expected. As soon as the app is starting to some heavy processing then it gets disconnected from axon server. It’s not possible to reconnect the app without restarting it.
It’s not related to application “A” or “B” - any app will be disconnected as soon as it starts doing some heavy calculations.
It looks like some kind of timeout. Any suggestions?
These are the logs from axonserver:

`

2019-10-29 11:23:57.907 WARN 7 — [pool-4-thread-1] i.axoniq.axonserver.grpc.CommandService : layer-backend.default: Error on connection from subscriber - CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.913 WARN 7 — [ool-4-thread-14] io.axoniq.axonserver.grpc.QueryService : layer-backend.default: Error on connection from subscriber - CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.927 WARN 7 — [pool-4-thread-9] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.928 WARN 7 — [pool-4-thread-9] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.943 WARN 7 — [ool-4-thread-15] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.944 WARN 7 — [ool-4-thread-11] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.947 WARN 7 — [ool-4-thread-11] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.945 WARN 7 — [ool-4-thread-12] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.952 WARN 7 — [ool-4-thread-12] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.946 WARN 7 — [pool-4-thread-2] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.948 WARN 7 — [pool-4-thread-8] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.957 WARN 7 — [pool-4-thread-8] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.950 WARN 7 — [ool-4-thread-16] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.959 WARN 7 — [ool-4-thread-16] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.957 WARN 7 — [pool-4-thread-6] i.a.a.message.event.EventDispatcher : Error on connection from client: CANCELLED: cancelled before receiving half close
2019-10-29 11:23:57.961 WARN 7 — [pool-4-thread-6] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.956 WARN 7 — [pool-4-thread-2] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader
2019-10-29 11:23:57.944 WARN 7 — [ool-4-thread-15] i.a.a.message.event.EventDispatcher : default: Tracking event processor closed by leader

`

W dniu piątek, 25 października 2019 13:25:16 UTC+2 użytkownik Fábio Carneiro napisał:

Hi Rafal,

we’ve seen cases where the garbage collector pauses were long enough for the connections to miss heartbeats. In those cases, switching to the G1 garbage collector helped. If you’re on JRE 9+, that garbage collector is the default. We’re going to update the Docker image for AxonServer accordingly.

I’m not sure why it wouldn’t reconnect, though. We’ll have a look at that.

Cheers,

twitter-icon_128x128.png

Hello Allard,

Thanks for the support. We identified the problem was that we were setting manually a fixed value for axon client-id on the application.yml. Axon Server does not allow the clients to reconnect and does not give any useful information to know they are not reconnecting because of that. It would be really nice if that can be improved.

Regards,
Fábio Carneiro