Axon server se - error logs

Deepak_Chaudhary · June 10, 2023, 10:40pm

I recently started experimenting with axon server se, client application is spring boot.

I noticed warnings as following in axon server logs -

2023-06-10 21:20 WARN 1 Reached soft limit on queue size xxx of size 10000, priority of item failed to be added 0, soft limit 10000.

and later in the logs, there were statements as following:

2023-06-10 21:25:49.842  WARN 1 --- [MessageBroker-1] i.a.a.message.command.CommandCache       : Found 1793 waiting commands to delete

Though above are listed as warning, is there any possibility of command / events losses?

Also, I tried looking around for these errors and couldn’t find much. At one place, I found the warning message ...Found 1793 waiting commands to delete could be due to time out before these commands would get handled.

Additionally, I was using postgres event store with same load, and all processing happened without any errors. My spring boot app axon config is as following -

axon:
  axonserver:
    enabled: true
    servers: 10.218.53.179:8080
  serializer:
    general: jackson
    events: jackson
    messages: jackson
  deserializer:
    general: jackson
    events: jackson
    messages: jackson
  eventhandling:
    processors:
      sc-pgroup:
        # Indicates the number of segments that should be created when the processor starts for the first time.
        # This is static and can be only dynamically adjusted with split / merge api's.
        # help with concurrent processing.
        initialSegmentCount: 10
        # The maximum number of threads the processor should process events with. Defaults to the number of initial
        # segments if this is not further specified.
        threadCount: 15
        # Indicates whether this processor should be Tracking, or Subscribing its source.
        mode: tracking
        # The default policy. It will force domain events that were raised from the same aggregate to be handled sequentially.
        # Thus, events from different aggregates may be handled concurrently.
        # This policy is typically suitable for Event Handling Components that update details from aggregates in databases.
        sequencing-policy: sequentialPerAggregatePolicy
        batch-size: 20
      part-pgroup:
        initialSegmentCount: 10
        threadCount: 15
        mode: tracking
        sequencing-policy: sequentialPerAggregatePolicy
        batch-size: 20
  metrics:
    auto-configuration:
      enabled: true
    micrometer:
      dimensional: true

I will really appreciate if someone could share any pointers to avoid the warnings thrown by the axon server per above. Thank you.

Bert_Laverman · June 12, 2023, 1:06pm

Hi Deepak!
Glad to see you trying out Axon Server.

I think that you should realize that Axon Server is not just a replacement for a PostgreSQL Event Store, but it also provides messaging services, and here you must search for the source of those warnings. Axon Framework applications that connect to Axon Server can send Commands but also provide handlers for them. The same holds for Queries. If the handler is in a different application, Axon Server needs to take that into account to prevent the sender from overflowing the handler with work. Axon Server uses “permits” for flow control and will log that warning if it threatens to run out of permits and queueing space. If it actually does run out of space because the handler simply cannot keep up, it will even send back exceptions to the app sending the Command or Query. Note that these are TransientExceptions because there will be space again as soon as the handler has worked through its backlog.

The second line is about what happens if the handler is really not responding quickly enough, or when it is disconnected, so Axon Server decides to “give up” and “clean up.” From the perspective of Axon Server, this is not an error, because it had no trouble itself; it is the handler that is the problem.

Your alternative scenario, where you use PostgreSQL as an Event Store, has perhaps been using a different strategy. For example, it may have just queued all messages, accepting that there was going to be a potentially long delay for the handling. I do not know how your Command and/or Query handling was configured in that scenario, so I cannot really explain that.

So, I think you need to take a look at the handler’s performance. The warning from Axon Server provides you with an indication of the performance issue and you could e.g. scale up the handler to solve that.

Cheers,
Bert Laverman

Deepak_Chaudhary · June 14, 2023, 3:05am

Thank you Bert, I will check the details you mentioned in response.