High availability using Enterprise version of axon server

Hi Team,
We have AWS EKS environment where we have deployed the axon server opensource version along with our application microservices deployed too to execute the saga transactions. I am using the Opens source version of axon server jar. We cannot achieve the high availability successfully, though we tried by putting 3 instances of axon server behind the clusterIP type loan balancer service and then accessed the axon servers using load balancer service url but during Saga transactions it seems that commands and events are not available due to multiple servers of axon server registered the different events and commands with them instead of single axon server having all the information. If we use enterprise version, how this high availability will be resolved for us, So I have couple of questions regarding use of Enterprise version of a axon server

  1. Do you provide a separate jar for the enterprise version of axon server? If it is same open source axon server jar only (provided with enterprise license) then how we are going to achieve the high availability of axon server in our environment?
  2. As I mentioned that we have AWS EKS region where we need to deploy the axon server, so we need our own cloud specific environment for axon server, is that okay for enterprise version for axon server?

Hi,

as you observed correctly, high availability is next-to-impossible to achieve with the free version of axonserver. This is a feature we decided to have in the paid-for versions exclusively.

In the enterprise version, the axonserver nodes run in a cluster that takes care of the high availability aspects for you automatically. It ensures that all healthy nodes in the cluster have the exact same events stored. This also ensures that Commands and Queries sent to one node are forwarded to the correct node the receiver is connected to. Last but not least, the cluster is able to withstand failure of a portion of it’s member nodes, so your applications will be able to (re)-connect to a healthy node and stay available.

To answer your questions explicitly:

  1. We did previously have to jars, the SE and EE jars for the free version (StandardEdition), as well as the paid-for version (EnterpriseEdition). However, we unified this into one single jar that has the features enabled/disabled as per the license it is supplied. What version are you currently using? Maybe you already are using the unified version?
  2. Axon Server can be deployed in a great range of environments. We have customers running it on EC2, ECS, EKS or even Fargate. Each having their benefits and drawbacks. So we are certain that the enterprise version will run on EKS. There are some caveats regarding k8s, so please check with us, before deploying a multi-node cluster.

Kind regards,
Marco

1 Like

Hi,

Great timing, It is likely tomorrow morning, we will offer a free trial of AxonIQ Console. With AxonIQ console, it’s possible to get a starter license, which allows the creation of a 3-node cluster and, thus, high availability.

2 Likes

Thanks so much @Marco_Amann for quick reply. appreciate it.

I am using the 2023.2.2 version of axon server.

Thanks @Gerard for letting me know. Looking forward to this new change. :slight_smile:

one quick question, this 3-node cluster free version, how long can I use it, is it free for good or certain time period only?

@Marco_Amann one more question please.
As the .events and .snapshots file are flat files, so how do you save them in enterprise version like what is your backup strategy as we need to ensure that all .events and .snapshots files are never lost even if the whole cluster of axon server goes down. OR we have to formulate our own backup policy/mechanism/design even if we are using the enterprise version of axon server??

Free for 30 days, paid plans for server start at 100$/month.

1 Like

Thanks @Gerard . One more question if we use the enterprise version then do you provide some help/manual also on how to deploy it in AWS EKS with high availability?

There are installation instructions within the console; you can also always reach out here.

@deepraj
The .snapshot files and .events files are replicated to all nodes using the raft replication protocol inside axon server. If one node looses it’s data (let’s assume the volume in AWS is deleted by accident), the data is replicated from the remaining healthy nodes. This means, that to loose these, all data on all nodes needs to be lost at the same time.

To even rule out this possibility, we have two options that complement each other:

  • You can (and should) take backups of your data in regular intervals. This is documented on this site: Backups | 4.9 | Axon Reference Guide.
  • Axon Server with enterprise features has the concept of active or passive backup nodes. These may be placed in other regions or AZs and will also receive replicated data. Should the whole cluster be wiped for some reason (let’s again assume someone deleted all the volumes in AWS by accident), you can restore the cluster from these backup nodes.

As @Gerard mentioned, you can use the installation instructions provided during the interactive console onboarding flow. To get a more in-depth overview, you can check the docs for that topic. If you need further assistance, colleagues from our customer-success and consulting team can answer specific questions or review architectural designs

1 Like

Thanks so much @Marco_Amann . It is a great help. :slight_smile:

@deepraj The free trial was just enabled, so you can try AxonIQ Console with all the paid features for free for 30 days.

@Gerard Thanks so much for letting me know.