We are using a subscription query to push changes to our front-end. Sometimes we restart Axon Server with our service still running. When our service reconnects to Axon Server it tries to start the subscription query again but then we start seeing AXONIQ-5000 errors. If we then restart the service then the error no longer apears and everything is working fine again.
Is there a way to recover from this error, without restarting the service?
We are using Axon Framework 4.5.3 (the behaviour is the same with 4.5.4) and server 4.5.10
Is this service a long-running (hot) stream? Also, do you see any error in the log because of getting AXONIQ-5000?
You could wrap the initialization of the subscription query into Mono.fromCallable and pair it with retry() operator, and when an error happens it will retry with a brand new subscription query connection.
Thanks, I will try the mono suggestion and report back.
It is indeed long-running. We have a front-end that is calling the subscription query via a fetch request. The result of the subscription query is converted into a sever sent event stream. When axon server shuts down, the front-end stops the request. But when the page is reload is does the request again. As soon as that happens the below exception shows up in a loop in the log. Nothing else.
2022-03-14 11:51:45.170 ERROR 18148 --- [ault-executor-6] s.a.p.b.t.s.TimelineEventListenerUseCase : Error while streaming timeline messages
org.axonframework.axonserver.connector.AxonServerException: INTERNAL: AXONIQ-5000
at org.axonframework.axonserver.connector.ErrorCode.lambda$static$24(ErrorCode.java:145) ~[axon-server-connector-4.5.8.jar:4.5.8]
at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:182) ~[axon-server-connector-4.5.8.jar:4.5.8]
at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:213) ~[axon-server-connector-4.5.8.jar:4.5.8]
at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:202) ~[axon-server-connector-4.5.8.jar:4.5.8]
at org.axonframework.axonserver.connector.event.util.GrpcExceptionParser.parse(GrpcExceptionParser.java:57) ~[axon-server-connector-4.5.8.jar:4.5.8]
at reactor.core.publisher.Mono.lambda$onErrorMap$31(Mono.java:3733) ~[reactor-core-3.4.13.jar:3.4.13]
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) ~[reactor-core-3.4.13.jar:3.4.13]
at reactor.core.publisher.MonoCompletionStage.lambda$subscribe$0(MonoCompletionStage.java:80) ~[reactor-core-3.4.13.jar:3.4.13]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
at io.axoniq.axonserver.connector.query.impl.SubscriptionQueryStream.onError(SubscriptionQueryStream.java:119) ~[axonserver-connector-java-4.5.4.jar:4.5.4]
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479) ~[grpc-stub-1.43.0.jar:1.43.0]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) ~[grpc-core-1.43.0.jar:1.43.0]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) ~[grpc-core-1.43.0.jar:1.43.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) ~[grpc-core-1.43.0.jar:1.43.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) ~[grpc-core-1.43.0.jar:1.43.0]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.43.0.jar:1.43.0]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.43.0.jar:1.43.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
2022-03-14 11:51:45.172 ERROR 18148 --- [ault-executor-6] e.b.s.c.GlobalControllerExceptionHandler : org.axonframework.axonserver.connector.AxonServerException: INTERNAL: AXONIQ-5000
Adding the Mono.fromCallable doesn’t seem to change the behaviour. I tried to create a sample application to reproduce the issue. However, in the sample the issue doesn’t appear.
It’s a bit of a puzzle now what’s the real problem.
I realized that we have a fetch running in the browser, if an error occurs during the fetch it will do the request again. If the fetch happens when Axon server is shutdown then you will be able to reproduce the error.
I have create an example application that can trigger the error:
These are the steps:
Start the application
Do a request to localhost:8080. This will output an server sent event stream, emitting 100 random uuid, once per second
Shut down Axon Server, application will error with unable to connect to Axon server
Do a request to localhost:8080. This will produce an error 500
Start Axon Server
Do a request to localhost:8080, this will produce an concurrency exception (optional, depends on timing)
Do another request to localhost:8080, this will produce an AXONIQ-5000 exception.
Sounds like a bug, will investigate…
Can you try one more thing: on the Axon Server node that is connected to your client application execute this API requestReconnect ,this will force reconnection and hopefully it can fix the problem.