Manual backup and recovery of event store for Axon SE mode

Shrirang_Khedekar · September 6, 2023, 9:36am

Hi Team,

We are using Axon Framework 4.7.3 and AxonServer 2023.1.0 in SE mode with SpringBoot 3.1.0.
We are using default configuration of eventbus,commandbus and EventStore.
We are using PooledStreamingEventProcessor. Also we are using JpaTokenStore.

Cosidering we are using Axon Server in SE mode, we are doing mannual backups of event stream segments and controlDB.

We are calling below APIS periodically.
curl --request POST --url http://localhost:8024/v1/backup/createControlDbBackup
curl --request GET --url 'http://localhost:8024/v1/backup/eventstore?type=EVENT

After ever API execution, we are copying over below files to a secure location:
00000000000000000000.events
controldb1693980970035.zip

To recover from axon event store failures we intend to use above back up files as below:

Take a fresh axon server zip and create a data folder inside the same.
Place these above 2 files under the data folder.
Start the axon server and connect to it from SpringBoot app.

This approach is working for us to recover if there were no new events persited in event store in old set up after calling the above back up APIs.
In this case, we can see that in recovered setup the tokens are initialized to head and any new events persited from recovered setup to event store are getting processed fine.
We believe this is because the eventStore and JpaTokenStore was in sync.

But in case there is even one event newly persited in old setup after the back APIs were called, the event processer in new setup is not processing any new events persited in event store.

Will conguring EventProcessor to use HeadToken as initial token will solve this problem?

Cosideting we are using PooledStreamingEventProcessor, does below snippet correct to achive this?

EventProcessingConfigurer.PooledStreamingProcessorConfiguration psepConfig =
(config, builder) → builder.initialToken(messageSource → messageSource.createHeadToken());

	configurer.registerPooledStreamingEventProcessor("ProjectionsGroup", (Function<Configuration, StreamableMessageSource<TrackedEventMessage<?>>>) psepConfig);

allardbz · September 11, 2023, 7:51am

Hi Shrirang,

are you also creating a backup of the (projection) database used by the applications? If so, make sure that backup is made first, before any backup of the event store. Otherwise, your projection will have projected events that aren’t part of the event store (anymore).

Remember that the project database also contains the token pointing at the last processed event. In case the projection is ahead (because of an “old” backup), then the token points somewhere in the future and you won’t be receiving events for a while until that pointer is reached again.

Your backup strategy for Axon Server seems fine, although if you’re creating a backup of SE, you might as well copy all the files in the data folder. No need to call the API for the event tiles. For the controlDB, you either use that API to create a file safe for copying, or shut down Axon Server, in which case it’s safe to comply copy all files directly.

Hope this helps.
Kind regards,

Allard

Shrirang_Khedekar · September 12, 2023, 8:37am

Thank you Allard for the reply.
While going through the documentation, we found that we have event replay functionality available for PooledStreamingEventProcessor.

In case the projection is ahead (because of an “old” backup), then the token points somewhere in the future, can we force it to the current head of the event store created from “old” back up?
This way we ensure that the moment we get new events in on recovered axon server setup, they are processed immediately rather than waiting for the pointer to reach there?

Shrirang_Khedekar · October 3, 2023, 11:55am

@allardbz Since we running Axon Server in SE mode, taking back up of only .events and .snapshots was sufficient for us. As the controldb file in SE mode just contains information on the users allowed access to the server and we had no users and any other roles setup for the server. We restricted the access to the server via AWS VPC and private subnet restrictions.

Also with respect to the scenario where the project data base is ahead of event store, we could not figure out how to repoint the event handler to specific position in stream.

So instead we followed approach of clearing up the token table before starting the springboot service post recovery. So this token was pointed to tail and all events got reprocessed. We ensured the event handling methods are idempotent hence the integrity of the projection DB data is maintained. We could do this as we were dealing with very less volume of events.