Axon Server docker image memory issues

Hello,

we are using Axon Server SE via standard docker image provided by AxonIQ. We run that image on GKE. The issue is, sometimes our Axon Server just errors out with various Java Heap out of memory errors. The most recent one was:

2022-09-29 13:11:33.887 INFO 1 --- [grpc-executor-4] i.a.a.logging.TopologyEventsLogger : Application connected: rivile365-masterdata, clientId = 1@masterdata-769757699f-xwrk8, clientStreamId = 1@masterdata-769757699f-xwrk8.7a68b1ff-676b-4262-9b41-8b3d502d4c41, context = default
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /data/java_pid1.hprof ...
Heap dump file created [251948801 bytes in 3.505 secs]
Exception in thread "http-nio-8024-Acceptor" java.lang.OutOfMemoryError: Java heap space

I have heap dump ready if anyone interested but from what I gather this happened during one of our microservices was redeployed (maybe during serialization phase?). These error are very common on microservice deployments when new pods are connecting to Axon Server and old ones disconnect one by one.

Anyway, I was wondering how to prevent this? Is there a documentation on recommended JVM settings for heap/off-heap memory? I didn’t found anything in the documentation. Does the memory usage of Axon Server depend on the amount of microservices connected and if so how? How memory usage depends on concurrent message count or message size?

I have also found that those distroless docker images run with standard Java settings, which means that Axon Server can use only 25% for heap of all available memory. In my opinion this could be way too low, especially considering that I didn’t observe any memory decreases during server operations. Is there some kind of process of “garbage collection” running on Axon Server at all?

Axon Server container has 768MB memory resource limit defined. We have ~15 Spring Boot microservices connected and maybe a hundred different commands, queries and events.


Best Regards,
Vilius

Hello Villus,

It’s hard to tell what exactly is going on without knowing the system’s exact behaviour. Generally speaking the number of connected services should not make a huge difference. Also establishing and dropping a connection itself, should not be a memory consuming operation. It’s probably more relevant to what those services do (particularly on startup as this is when you observe the OOM error). It could be related to the number or the size of messages they process.

As a general advice, you can start AxonServer with 2GB heap, 2GB direct memory and then some os memory for disk i/o. Then you should monitor the server to see how it behaves and adjust accordingly.

Thank you Milen!

So you are saying that default 25% for Heap should be something like 50%?
Direct memory under Java 11 essentially equals to max heap size, so if for example I give 4GB to Axon Server container and set -XX:MaxRAMPercentage=50 I should get 2GB for heap and 2GB for MaxDirectMemory. OS memory should fit into what’s not used by the heap and direct memory, since in the container this is a small footprint.

Does this sound about right? Are there any other Axon Server flows which would require off-heap memory?

BTW, just wondering what direct memory in Axon Server is used for?