Axon server / framework 4.5.1; AXONIQ-2000

In one of our environments, all commands are rejected by the axon server, producing the following warning with the command:

"Command '<COMMAND>' resulted in org.axonframework.commandhandling.CommandExecutionException(OUT_OF_RANGE: [AXONIQ-2000] Invalid sequence number 4 for aggregate <aggregate name>, expected 0)"

On the axon server side of things I do see a lot of warnings about snapshots:

2021-11-18 08:20:29.301  WARN 1 --- [  data-writer-7] i.a.a.message.event.EventDispatcher      : appendSnapshot:  Error on connection from event store: [AXONIQ-2000] Invalid sequence number while storing snapshot. Highest aggregate <aggregate-name> sequence number: -1, snapshot sequence 12270.

I am not sure these warnings are entirely related, the snaphot warnings on the server side are not about the same aggregate as on the client side. And the aggregate won’t produce any more events.

I’ve tried clearing the snapshots, but that did not help my situation. We have similar configurations on different environments, but only problems on one of them. The problem seems to be caused by a restart of axon server. Anyone got any ideas?

Update: I did some further analysis. Events directly pushed to the event bus are being produced. Only commands are not handled. So it feels like a problem with the aggregate / snapshotting.

Hi Maarten-Jan,

This error occurs when an event is appended with a sequence number that is either higher or lower than what was expected. In this case, it seems that the event store doesn’t have any event stored for that aggregate.

To verify that, you can execute the following query in the Axon Server dashboard: aggregateIdentifier = 'identifier_here'.

You mentioned a restart. Was that a forced restart, or a graceful one?

I have 4 events for this aggregate in axon server. I’ve just verified this again by query’ing. Same goes for the other warnings for other aggregates.

Yeah we had some internal discussion on that. The kubernetes server was updated by Google Cloud in a maintenance period in over the weekend. So it should have been a graceful shutdown, but I’m not sure it was. I’m going to look into that. I’ll post my findings back here later.

From what I can find, and what I see with other components, Google does do a graceful shutdown of components with a node upgrade. And given we’re using axon server 4.3+, I would expect Axon server to have been gracefully shut down as well. I do not see any logs indicating a graceful shutdown (our other spring boot apps do produce these), I don’t know if there should be?

I did find another log message, the last message logged (or at least stored in GCP) before shutdown in our situation, that may be of interest:

2021-11-06 23:27:11.250 WARN 1 --- [ault-trackers-1] i.a.a.message.event.EventDispatcher : listEvents: Error on connection from event store: [AXONIQ-9000] Failed to read event: 904611

Perhaps it is related to our problems?

Hi Maarten,

I suspect the restart wasn’t very graceful from the AxonServer perspective. That last log line indicates it was “happily” processing events as the machine was being restarted.

As a result of this restart, it’s possible that the indices and data actually stored in the event store have gone out of sync. If that doesn’t resolve automatically after a restart (which seems to be the case here), the easiest way to recover is by stopping Axon Server, removing all the index file from the storage and starting Axon Server again. It will rebuild all the indices.

What command did you execute to run Axon Server? We sometimes notice that certain wrappers don’t always pass the KILL signals to the java process properly.

Note that Axon Server Enterprise uses a write-ahead-log, which prevents these issues.

Hi Allard, Thank you for your response.

I’ve tried removing all the index and bloom files, but that did not resolve the issue. I ended up with restoring the latest backup we had before the crash. Would not removing the bloom files made a difference?

In order not to conflate issues, I’ll create a new topic on the (not so) graceful shutdown issue.

Update: see here for the new topic.

No, removing the bloom filters was also required.
Do you see the index files being recreated after startup?

I see you’re using Axon Server SE 4.3. I would recommend updating to the latest version, which is 4.5.8.

We’re currently on 4.5.1. I don’t know if much in this regard has changed between 4.5.1 and 4.5.8?