Every now and then I’m getting below concurrency exceptions. I’m using AxonServer EE 4.5.7. What may be the reason for this? Noticing this after enabling snapshot using EventCountSnapshotTriggerDefinition(Update : This issue is observed even after clearing events/snaphots and removing snapshot trigger from aggregate. )
org.axonframework.commandhandling.CommandExecutionException: OUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 5 for aggregate , expected 7
at org.axonframework.axonserver.connector.ErrorCode.lambda$static$11(ErrorCode.java:88)
at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:182)
at org.axonframework.axonserver.connector.command.CommandSerializer.deserialize(CommandSerializer.java:164)
at org.axonframework.axonserver.connector.command.AxonServerCommandBus.lambda$doDispatch$2(AxonServerCommandBus.java:167)
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at io.axoniq.axonserver.connector.command.impl.CommandChannelImpl$CommandResponseHandler.onNext(CommandChannelImpl.java:370)
at io.axoniq.axonserver.connector.command.impl.CommandChannelImpl$CommandResponseHandler.onNext(CommandChannelImpl.java:357)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:716)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:701)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: AxonServerRemoteCommandHandlingException{message=An exception was thrown by the remote message handling component: OUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 5 for aggregate , expected 7, errorCode='AXONIQ-4002', server='8@server-677c89b899-vp667'}
at org.axonframework.axonserver.connector.ErrorCode.lambda$static$11(ErrorCode.java:86)
... 16 more
If you have multiple instances of your application running, you should configure the cluster mode isClustered: true . This is very important, but this will not solve the issue you are describing in the logs.
The deadline payload message (handled by the @DeadlineHandler) is not consistently routed to the same application instance that previously handled this aggregate (aggregateIdentifier). Because of this fact, you can update/apply two events for the same aggregate in parallel triggering the optimistic locking mechanism on the event store level. One event will fail with the wrong sequence message because it has a stale sequence number.
One way of solving this is to send a command from your DeadlineHandler (not applying the event directly). Commands are constantly routed to appropriate application instances / Commands with the same aggregate identifier will be sequentially routed to the same application instance.
Thank you @Ivan_Dugalic, Sending a command instead of applying event does solve the issue. But will there be any fix available to consistently route deadline handlers in upcoming releases?
We have an idea to actually replace QUARZ and implement scheduling on our own (Axon Server). One part of this would involve the consistent routing of deadlines. I am not sure if this idea is already drafted as a concrete plan and included in the roadmap, TBH.
For now, I would suggest sending a command. This feature will be discussed internally in the upcoming weeks and we might give it more priority. I will keep you informed.