How to protect Axon Server from misbehaving microservices

So we have this constant issue that some of our business microservices start to send a lot of events. Very often this brings down Axon Server with OutOfMemory heap errors.

Is there a way to protect Axon Server from these misbehaving microservices? Maybe some limiting parameters or something like that?

We really struggle to understand how infrastructure sizing model works for apps which use Axon Server.

We are using latest Axon Server SE 4.6.7.

Hi Vilius,

what do you consider “a lot of events” and how much memory does your Axon Server instance currently have at its disposal (total machine memory and heap size)?

We have seen some serious environments pumping a lot of data into Axon Server (EE, mostly), and there is generally not a problem with heap size / OOM.

We run standard Axon Server docker image with 1.5GB limit which sets heap size to 25% of that I believe. But we are in a process of giving it 4GB with 50% dedicated for heap.

Nevertheless, I would not concentrate on what resources are available to Axon Server. Given a fair amount of chatty microservices I believe it can be brought down in any case, doesn’t matter how much resources you though at it.

Hence my question regarding protection. For example RabbitMQ has queue blocking alarms, Kafka has quotas, Artemis has Flow Control configurations etc. What is available for Axon Server in this case?

I have checked with the team, just to make sure I wouldn’t give any wrong advice. Axon Server SE doesn’t have any flow control on append-event transactions. Axon Server EE does have a limit on the number of uncommitted transactions it allows until stopping to accept new transactions.

However, the reason I started about heap sizes, is that we haven’t seen any of our customers hit these limits before. And some of them have very significant volumes of events going through their systems.

At least with 400MB of Heap Size we were constantly hitting OutOfMemory errors. We run ~20 microservices and hundreds of event handlers. Some of them send tens of events per second, some just few events per second but some of these few events could be 1MB in size. We use PostgreSQL as our persistent store.

I know that events should be small and our developers are working on making everything in order, however we could have a misbehaving microservice, user peak loads, or just a junior developer making mistake any day.

I have set Heap Size to 2GB now. Not having any kind of protection makes me really nervous :expressionless: , though.