(Axon Server 4.4.10, Axon Framework 4.4.5 for Java)
Our services sometimes have trouble establishing connections to Axon Server. We sadly don’t have any way to reliably reproduce the error at the moment, but we noticed that the first service to connect to a context can do so successfully (100% of the time), while the ones that subsequently connect to the same context usually don’t work properly.
The service we have issue with has to connect to two contexts. I don’t know if this has any impact. It is also a “whitelabelled” application, meaning that we deploy multiple (5+) instances of that application.
Here’s how we noticed that we can have only one instance of our service running:
- We stopped all our dev instances
- We started our-app-label-aaa (it’s working)
- We started our-app-label-bbb (it’s not working)
- Stop our-app-label-aaa and our-app-label-bbb
- We started our-app-label-bbb (it’s working)
- We started our-app-label-aaa (it’s not working)
We did the same kind of combination for some other labels (eg: ccc, ddd, …) and we noticed that the first app we start was always OK. The second one, was (~90% of the time) unable to send a message to the context properly.
So we’re wondering how to debug and find what we should fix for this? Is it a connection limit on the server side (is there a sort of connection pool)? Or is there anything in the client we could fix? What are the logs we should enable at the DEBUG or TRACE level to understand what’s going on?
The full stack trace is below. We have Isito in our stack and it might help us pinpoint the issue. The client side receives a “connection reset”:
org.axonframework.axonserver.connector.query.AxonServerQueryDispatchException: UNAVAILABLE: upstream connect error or disconnect/reset before headers. reset reason: local reset at org.axonframework.axonserver.connector.ErrorCode.lambda$static$16(ErrorCode.java:112) at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:182) at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:213) at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:202) at java.util.Optional.map(Unknown Source) at org.axonframework.axonserver.connector.query.AxonServerQueryBus$ResponseProcessingTask.run(AxonServerQueryBus.java:722) ... 5 common frames omitted Wrapped by: java.util.concurrent.CompletionException: org.axonframework.axonserver.connector.query.AxonServerQueryDispatchException: UNAVAILABLE: upstream connect error or disconnect/reset before headers. reset reason: local reset at java.util.concurrent.CompletableFuture.reportJoin(Unknown Source) at java.util.concurrent.CompletableFuture.join(Unknown Source) at com.our.co.our.app.controller.OurController.findByNumberAndClientNumber(OurController.java:35) ... 19 frames excluded at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) at or...
Note that we sometimes see this as well:
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3.401354560s. [buffered_nanos=3599511605, waiting_for_connection] at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) at io.axoniq.axonserver.grpc.control.PlatformServiceGrpc$PlatformServiceBlockingStub.getPlatformServer(PlatformServiceGrpc.java:250) at io.axoniq.axonserver.connector.impl.AxonServerManagedChannel.connectChannel(AxonServerManagedChannel.java:115) at io.axoniq.axonserver.connector.impl.AxonServerManagedChannel.createConnection(AxonServerManagedChannel.java:319) at io.axoniq.axonserver.connector.impl.AxonServerManagedChannel.ensureConnected(AxonServerManagedChannel.java:299) at io.axoniq.axonserver.connector.impl.AxonServerManagedChannel.lambda$new$0(AxonServerManagedChannel.java:100) ... at java.lang.Thread.run(Unknown Source)