Axon Server clustering

Hi All,
If axon server is used as eventstore which seems to be the recommended event-store by axon ahead of kafka,cassandra etc how replication is managed upon write when lets says 3 nodes of axon server are used?

It seems axon cluster is a single master topology like MongoDB where writes always happen to the same node.Does data replication happens asynchonously or synchronusly upon writing an event.If it is asynchronous is there provisions to sync nodes when node is up again?

Axon server writes are memory first or full disk write happens on a single atomic write?

Thanks,
Isuru

Hi Isuru,
please let me start the answer with a question: are you talking about Axon Server SE or EE? Just in case you’re not aware there are two editions, Standard Edition is what is publicly available on GitHub, while Enterprise Edition is a commercial product that adds clustering, more fine-grained security, and a lot more.

So, assuming you’re asking about the clustering of Axon Server EE; it is based on the Raft Consensus Algorithm, so data is replicated to all nodes in the cluster and reported as committed when more than half of the nodes have confirmed it. If a node has been down and comes back online, it will contact the cluster to find out if there is a current leader and request the current state of the log. The Raft algorithm ensures that the node with the most up-to-date (committed) log will become the leader and it will update any followers that are behind.

As for the memory vs disk bit: Axon Server (all editions) uses memory-mapped I/O for the Event Store so there is no separate memory cache in between.

Cheers,
Bert Laverman

Hi Bert,
Thanks for the reply.I have below doubts.Please clarify them.

  • If cluster has 7 nodes then write is committed when 4 nodes have finished writing same copy of the data.Isn’t it a performance over head?Is 3 node the recommended node count since same data copy is available in all nodes.Why to go with 5 7 nodes if its same copy?

  • Does this cluster support sharding?

Thanks,
Isuru

Hi Isuru,
Yes, these are good questions. As usual with these things, there are pluses and minuses to most choices. In this case, the questions are about your non-functional requirements.

In Axon Server EE clusters you assign nodes to Replication Groups and then create one or more contexts in each RG. We are now talking about the number of nodes in such an RG. A typical use of a larger replication group has to do with regional availability.

  • If I use a 3-node cluster deployed in e.g. Kubernetes, I can lose a single node and still have a working RG. (2 out of 3 available) If however the k8s cluster is located in a single availability zone, that becomes my concern.
  • If I use a multi-AZ k8s cluster, Kubernetes will take care of the single-AZ unavailability for me, but the risk is that more than a single AS node was on the k8s Worker Node that we lost. Then the AS cluster is unavailable until k8s has managed to respawn a majority. If I am able to fix the location and I only lost a single node, I am fine again. If I use VMs to host AS I can explicitly choose the AZ and gain predictability on what the impact of losing an entire AZ is.
  • If I use 5 or more nodes and use a multi-region deployment, where each node is additionally deployed in different AZs, I increase my availability again.

The downside of this exercise, as you rightly note, is performance, since the increased distance between nodes, as well as an increase in the number of nodes, increases the time needed to commit a transaction.

So, if your primary concern is performance, and you can survive unavailability for an amount of time, you could make do with even a single node deployment. If your limits on allowable downtime are tight, you’ll take the hit in performance and increase the (distributed) deployment of nodes. This is the discussion you need to have on the non-functional requirements.

A second “axis” you can look at has to do with the kind of workload for the AS cluster. If you have a high amount of traffic concerned with Commands and Queries rather than Events, you could benefit from adding Messaging-Only nodes. These increase the availability of these types of workloads while having little or no effect on the Event Store side.

If your Event Store has a strong separation favoring recent events, the cost vs performance balance can be influenced by employing multi-tiered storage. Nodes with the Primary role can use faster (and generally more expensive) storage for recent events, with Secondary nodes employing cheaper (and slower) storage for the full set.

Then we also can look at the Disaster Recovery side of the picture, starting with the question of your requirements concerning the cut-off point of acceptable loss of events. Using standard backup technology you can use regular snapshots or file-based copies, where you have the risk determined by how often you perform a backup. You can tighten this limit significantly by adding a node to the RG in the Passive Backup role, which will receive all events asynchronously. Even if you have this node in a different region, it will normally be behind only in terms of milliseconds. (Ok, ok, depending on how far offsite it is located) If you add a set of Active Backup nodes, you can get the assurance of always having up-to-date backups, at the cost of having to wait for at least one of them acknowledging storage. Either form can again be backed up using conventional technology for further guarantees.

So, your choice in the number of nodes depends very much on your non-functional requirements and there is no single “best” solution.

Now, as for your last question: no, Axon Server does not support sharding, although you could argue that Primary/Secondary tiered storage provides a comparable solution. Although we are certainly looking at this subject, so far the use-cases for an Event Store do not particularly make this an easy feature to add, because we have essentially no natural keys that we could use to partition the data. Events are opaque for Axon Server, and if you define partitions based on event metadata or payload, you will always end up with (a sizable amount of) events that do not fit that model. You could use the Aggregate Id for this, but then you either increase the number of nodes your need significantly (because you still have to ensure enough copies of the events are stored), or else reduce the positive effects to only event replay scenarios, and we already balance the clients across nodes to achieve this. So for the time being we have not found a strong enough reason to do this.

Hope this helps you further,
Bert

Thanks Bert Its a crystal clear explanation.

THanks
Isuru