Subscription query throws INTERNAL: AXONIQ-5000 exception after Axon Server restart

jaco · March 14, 2022, 10:57am

We are using a subscription query to push changes to our front-end. Sometimes we restart Axon Server with our service still running. When our service reconnects to Axon Server it tries to start the subscription query again but then we start seeing AXONIQ-5000 errors. If we then restart the service then the error no longer apears and everything is working fine again.

Is there a way to recover from this error, without restarting the service?

We are using Axon Framework 4.5.3 (the behaviour is the same with 4.5.4) and server 4.5.10

stefand · March 14, 2022, 11:19am

Is this service a long-running (hot) stream? Also, do you see any error in the log because of getting AXONIQ-5000?

You could wrap the initialization of the subscription query into Mono.fromCallable and pair it with retry() operator, and when an error happens it will retry with a brand new subscription query connection.

jaco · March 14, 2022, 11:56am

Thanks, I will try the mono suggestion and report back.

It is indeed long-running. We have a front-end that is calling the subscription query via a fetch request. The result of the subscription query is converted into a sever sent event stream. When axon server shuts down, the front-end stops the request. But when the page is reload is does the request again. As soon as that happens the below exception shows up in a loop in the log. Nothing else.

2022-03-14 11:51:45.170 ERROR 18148 --- [ault-executor-6] s.a.p.b.t.s.TimelineEventListenerUseCase : Error while streaming timeline messages

org.axonframework.axonserver.connector.AxonServerException: INTERNAL: AXONIQ-5000
	at org.axonframework.axonserver.connector.ErrorCode.lambda$static$24(ErrorCode.java:145) ~[axon-server-connector-4.5.8.jar:4.5.8]
	at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:182) ~[axon-server-connector-4.5.8.jar:4.5.8]
	at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:213) ~[axon-server-connector-4.5.8.jar:4.5.8]
	at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:202) ~[axon-server-connector-4.5.8.jar:4.5.8]
	at org.axonframework.axonserver.connector.event.util.GrpcExceptionParser.parse(GrpcExceptionParser.java:57) ~[axon-server-connector-4.5.8.jar:4.5.8]
	at reactor.core.publisher.Mono.lambda$onErrorMap$31(Mono.java:3733) ~[reactor-core-3.4.13.jar:3.4.13]
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) ~[reactor-core-3.4.13.jar:3.4.13]
	at reactor.core.publisher.MonoCompletionStage.lambda$subscribe$0(MonoCompletionStage.java:80) ~[reactor-core-3.4.13.jar:3.4.13]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[na:na]
	at io.axoniq.axonserver.connector.query.impl.SubscriptionQueryStream.onError(SubscriptionQueryStream.java:119) ~[axonserver-connector-java-4.5.4.jar:4.5.4]
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:479) ~[grpc-stub-1.43.0.jar:1.43.0]
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562) ~[grpc-core-1.43.0.jar:1.43.0]
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) ~[grpc-core-1.43.0.jar:1.43.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743) ~[grpc-core-1.43.0.jar:1.43.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722) ~[grpc-core-1.43.0.jar:1.43.0]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.43.0.jar:1.43.0]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.43.0.jar:1.43.0]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

2022-03-14 11:51:45.172 ERROR 18148 --- [ault-executor-6] e.b.s.c.GlobalControllerExceptionHandler : org.axonframework.axonserver.connector.AxonServerException: INTERNAL: AXONIQ-5000

stefand · March 14, 2022, 12:02pm

And when you restart Axon Server do you see query (handler) listed as registered in the dashboard?

You may also upgrade to Axon Framework 4.5.8, that’s the latest version.

jaco · March 14, 2022, 12:11pm

We have two query handlers, but after restart the query handlers do not show anymore.

stefand · March 15, 2022, 9:46am

does the issue still persist?

jaco · March 15, 2022, 9:57am

It does indeed.

Adding the Mono.fromCallable doesn’t seem to change the behaviour. I tried to create a sample application to reproduce the issue. However, in the sample the issue doesn’t appear.

It’s a bit of a puzzle now what’s the real problem.

jaco · March 15, 2022, 11:03am

I realized that we have a fetch running in the browser, if an error occurs during the fetch it will do the request again. If the fetch happens when Axon server is shutdown then you will be able to reproduce the error.

I have create an example application that can trigger the error:

These are the steps:

Start the application
Do a request to localhost:8080. This will output an server sent event stream, emitting 100 random uuid, once per second
Shut down Axon Server, application will error with unable to connect to Axon server
Do a request to localhost:8080. This will produce an error 500
Start Axon Server
Do a request to localhost:8080, this will produce an concurrency exception (optional, depends on timing)
Do another request to localhost:8080, this will produce an AXONIQ-5000 exception.

stefand · March 15, 2022, 11:08am

Sounds like a bug, will investigate…
Can you try one more thing: on the Axon Server node that is connected to your client application execute this API requestReconnect ,this will force reconnection and hopefully it can fix the problem.

jaco · March 15, 2022, 11:22am

Thanks.

It seems that my Axon Server doesn’t have swagger ui. I checked the docs, but those are also referring to swagger.

I’m running the standard Axon Server SE docker container. Is there another way to access swagger?

lfgcampos · March 18, 2022, 9:11pm

Hi @jaco,

I believe the swagger url changed on new versions!
You might be able to find it under http://localhost:8024/swagger-ui/index.html#/

KR,

jaco · March 21, 2022, 1:19pm

Thank you. I have the swagger ui working.

Executing that call resolves the problem. So it’s a good work around for now.