Axon server single node and transactions

stijnhaezebrouck · August 23, 2021, 11:05am

Hello, here at Renta Solutions, we will be using axon server as a single node in a project for event archiving. As events to be archived are typically provided in batch jobs, there is no need for a clusterized environment.

However, about transactions: it is correct that, in a single node setup, transactions might commit before the data is actually written to disk by Axon server? In a multi node setup, this would be ok, as there is a low chance of all nodes dropping out at the same time, but in a single node setup, can a crashing Axon Server node (the only one) lead to data loss?

Sara_Torrey · August 23, 2021, 3:09pm

Can you tell me more about your setup? Are you using event sourcing? Are you also using a relational DB for storing your data as well? Additionally, are you trying to find out how to handle retry when a transaction is unsuccessful?

stijnhaezebrouck · August 23, 2021, 3:28pm

In the surrent setup, an archiving project, we are reading from a relational database, and publishing events in Axon Server. However, after committing the events to axon server, the events in de relational database are immediately removed (as they are considered to be archived in axon server). Question is: if the event is committed to axon server; and it crashes, is the committed event guarenteed to be stored on disk or not?

After committing, one might expect that it is. However, I remember from the past with axon server (or axon db, earlier), that axon server would already commit the transaction prior to storing it. It did that after the transaction was accepted by multpile nodes, so that in case one node would fail to write the event to disk, it could retrieve it later from the other nodes. That was also the reason why 3 axon server nodes should not write to the same volume. If the volume crashes, ALL the 3 nodes would crash at the same time (potentialy loosing data). This was by design, in favor of performance.

Hence, the recommended use of axon server in production was to have multiple nodes instead of just one.
I wonder whether this is still the case today.

Bert_Laverman · August 24, 2021, 7:51am

Stijn,
I hope you don’t mind if I correct you here slightly because the way you state your concern seems to indicate that Axon Server has not requested the OS to write the data to disk when it sends the confirmation back to the client. First of all, when Axon Server stores the data for itself, it does this using memory-mapped files. This means that it actually has transferred the responsibility for that data to the OS already and, barring catastrophic failure, nothing is left to be done.

In modern Operating Systems, there is virtually no difference between asking the OS to write a block of data to disk (or reading a block from disk) and using an explicitly memory-mapped file, apart from the extra copy operation to or from that buffer.

Axon Server has an additional function that helps prevent data loss through catastrophic loss of the server, by explicitly asking the OS to write any (memory-mapped) data to disk at a certain interval. This interval is controlled by three properties:

axoniq.axonserver.replication.force-interval controls the interval for the (Axon Server EE) Replication Logs.
axoniq.axonserver.event.force-interval controls the interval for the events in the Event Store.
axoniq.axonserver.snapshot.force-interval controls the interval for the snapshots in the Event Store.

All three have a default value of 1000, which translates to 1000 microseconds or 1 second.

So, to get back to your original question: No, when the data is committed it has been written to the files, and (strictly speaking) only the OS needs to do some extra work, but that is no longer Axon Server’s responsibility. However, you are correct in the sense that the data may not have actually been transferred to disk, so a catastrophic failure at precisely that instant could cause the situation you describe. Forcing the synchronization at any change in the data or at every write call has a definite performance impact, so Axon Server uses a timed synchronization loop instead. This synchronization is also forced during a normal shutdown of the application. But the actual execution of the write is completely in the hands of the OS.

I hope this answers your concerns.

Cheers,
Bert Laverman