CommandExecutionException: OUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 5 for aggregate , expected 7

Hi,

Every now and then I’m getting below concurrency exceptions. I’m using AxonServer EE 4.5.7. What may be the reason for this? Noticing this after enabling snapshot using EventCountSnapshotTriggerDefinition(Update : This issue is observed even after clearing events/snaphots and removing snapshot trigger from aggregate. )

org.axonframework.commandhandling.CommandExecutionException: OUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 5 for aggregate , expected 7
	at org.axonframework.axonserver.connector.ErrorCode.lambda$static$11(ErrorCode.java:88)
	at org.axonframework.axonserver.connector.ErrorCode.convert(ErrorCode.java:182)
	at org.axonframework.axonserver.connector.command.CommandSerializer.deserialize(CommandSerializer.java:164)
	at org.axonframework.axonserver.connector.command.AxonServerCommandBus.lambda$doDispatch$2(AxonServerCommandBus.java:167)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at io.axoniq.axonserver.connector.command.impl.CommandChannelImpl$CommandResponseHandler.onNext(CommandChannelImpl.java:370)
	at io.axoniq.axonserver.connector.command.impl.CommandChannelImpl$CommandResponseHandler.onNext(CommandChannelImpl.java:357)
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:716)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:701)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: AxonServerRemoteCommandHandlingException{message=An exception was thrown by the remote message handling component: OUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 5 for aggregate , expected 7, errorCode='AXONIQ-4002', server='8@server-677c89b899-vp667'}
	at org.axonframework.axonserver.connector.ErrorCode.lambda$static$11(ErrorCode.java:86)
	... 16 more

Regards,
Roy

Also, this error pops up usually after an aggregate event published from a aggregate DeadlineHandler.

    @DeadlineHandler(deadlineName = "deadLine")
    public void handle(DeadlinePayload payload){
        apply(new XYZEvent("id"));
    }

Hi,

Is Quartz running in cluster mode (quartz.properties.org.quartz.jobStore.isClustered: true)?

No, Quartz is not running in cluster mode. Below are the configs used for your reference:

spring:
  quartz:
    job-store-type: jdbc
    properties:
      org.quartz.threadPool:
        makeThreadsDaemons: true
        threadCount: 10
      org.quartz.scheduler:
        batchTriggerAcquisitionFireAheadTimeWindow: 0
        instanceId: AUTO
        instanceName: deadline-instance
        batchTriggerAcquisitionMaxCount: 20
        idleWaitTime: 60000
        makeSchedulerThreadDaemon: true
      org.quartz.jobStore:
        dataSource: xxxx
        acquireTriggersWithinLock: true
        tablePrefix: xxx_
        class: org.quartz.impl.jdbcjobstore.JobStoreCMT
        isClustered: false
        misfireThreshold: 120000
        #clusterCheckinInterval: 20000
        driverDelegateClass: org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
      org.quartz.plugin.shutdownhook:
        class: org.quartz.plugins.management.ShutdownHookPlugin
        cleanShutdown: true

Hi @S.Roy

If you have multiple instances of your application running, you should configure the cluster mode isClustered: true . This is very important, but this will not solve the issue you are describing in the logs.

The deadline payload message (handled by the @DeadlineHandler) is not consistently routed to the same application instance that previously handled this aggregate (aggregateIdentifier). Because of this fact, you can update/apply two events for the same aggregate in parallel triggering the optimistic locking mechanism on the event store level. One event will fail with the wrong sequence message because it has a stale sequence number.

One way of solving this is to send a command from your DeadlineHandler (not applying the event directly). Commands are constantly routed to appropriate application instances / Commands with the same aggregate identifier will be sequentially routed to the same application instance.

Best,
Ivan

Thank you @Ivan_Dugalic, Sending a command instead of applying event does solve the issue. But will there be any fix available to consistently route deadline handlers in upcoming releases?

Sure, thank you again for your feedback.

Regards,
Roy

We have an idea to actually replace QUARZ and implement scheduling on our own (Axon Server). One part of this would involve the consistent routing of deadlines. I am not sure if this idea is already drafted as a concrete plan and included in the roadmap, TBH.

For now, I would suggest sending a command. This feature will be discussed internally in the upcoming weeks and we might give it more priority. I will keep you informed.

Best,
Ivan