Performance Issue with TrackingEventProcessor

Hi,

we have functional regression tests, which generate a huge load (because they run in parallel).
The test are taking very long, so we started to understand the deeps behind the used TrackingEventProcessor.

We use 4 pods/nodes and a balancer to fire the requests.

The current configuration is:

eventProcessingConfiguration
    .usingTrackingProcessors(configuration -> configuration
        .getComponent(TrackingEventProcessorConfiguration.class,
            () -> TrackingEventProcessorConfiguration.forParallelProcessing(4))
        .andInitialSegmentsCount(4)
        .andBatchSize(32));

Now I have 2 questions:

  1. Even after using segments, the processing only take place on one pod. We have 25 processors (Sagas, Groups) combined and with 2 aggregate roots. We have not changed the policy, so SequentialPerAggregatePolicy is used.
    My understanding was, that more then one node should process the events with the tracking token. Am I wrong? Unfortunately, it is not working.

  2. So when a thread runs the processor, it reclaims the token somewhat every ms and just updates the timestamp (also when event stream is empty). Why is that necessary? Would it be adequate to just update it when an event was processed?

I’m asking because this generates a huge load on a database and I’m thinking how to optimize that.

Thank a lot!

Hi,

  1. By default, Axon creates 1 segment for each processor. This means only 1 node can be active. I you want more nodes to be active, you should increase the number of segments. As of Axon 4.1 (yet to be released), this can be done dynamically at runtime. Current versions allow the number of segments to be set when a processor initializes the tokens (the first time). In your config, I see initialSegmentCount set to 4, but note that this is ignored when segments have already been created.

  2. It should not reclaim every ms, but every second. Which Event Store implementation do you use? The timestamp needs to be updated (though not that frequently) to let other nodes know that the claim of the token is still “fresh”. Nodes will “steal” a claim if it has expired, to allow continuing processing when a node crashed.

Cheers,

Allard

Thanks a lot.

  1. Yes, that was my issues. After deleting the tracking tokens and restarting the service, the segments got created automatically. Good to here that this done dynamically with 4.1

  2. I got it - it’s some kind of a watchdog, which checks for stalled processors. Do you know what the “timeout” value is?

We use mongo as our data storage (company decision). When running the load tests, I see a heavy traffic on the token store. Most updates take less then 10ms, but some of them took up to 10s - average is 3-4s. The index is used for the update, but maybe that’s a lock issue.

Roundabout 30-40 events/s are processed, but that can’t be the limit.
I would like to investigate a bit more. Any hints where to deep dive?

Hi,

increase the number of segments. As of Axon 4.1 (yet to be released), this can be done dynamically at runtime.
Current versions allow the number of segments to be set when a processor initializes the tokens (the first time).

Apart from currently not being able to modify the segments dynamically at runtime or after the
tracking token has been created, I have some more questions regarding
scaling up projections (AxonServer and Axon 4.0) - these might have been answered previously,
I’m trying to put the pieces together about how things currently work.

When trying to run 2 nodes of the same query service, my options are:

  • either set the initial segment count let’s say to 2, with 2 threads, then
    all of them are going to be consumed by the first node, second gets nothing
    thus no events processed there (can’t scale up)
  • or start with 4 segments and 2 threads, that means only half of the events
    get processed on the first node, so if I don’t start up a second node (or later it
    crashes) not all events are processed, I always need 2 nodes running (can’t scale down).

I get that in the first case I can go to AxonDashboard and click on the processor then
manually assign the segments to nodes, so I can scale up manually (still need to take care of properly setting the initial segment count).
Is there a way to do that automatically and make it compatible with autoscaling?
Am I missing something or is the manual way the only way right now?

Also, what is the current way of doing a replay in the latter case, with multiple nodes,
segments shared between them? I get: Unable to claim token ‘org.myeventhandlerpackage[0]’
Replay is done like:

eventProcessingConfiguration
.eventProcessorByProcessingGroup(packageName,
TrackingEventProcessor.class)
.ifPresent(trackingEventProcessor → {
trackingEventProcessor.shutDown();
trackingEventProcessor.resetTokens();
trackingEventProcessor.start();
});

Unless the node that initiates the replay has all of the segments, it fails with the unable to claim token
exception.

What I would expect/like to achieve:
Starting with one node, one segment, and whenever new nodes of the same service get connected
to AxonServer, segments are automatically added and reassigned so that they are equally shared between the nodes
(and the other way around when scaling down). And also want to be able to perform a replay when scaled up :slight_smile:

Thanks,
Regards

Hi Vilmos,

Your question is two fold, so let me number my answers in that respect:

  1. Automatic Segment (un)assigning
    This feature is slated to be introduced, as Allard pointed out, in Axon 4.1.

Thus, in Axon Server 4.0, this functionality is not yet available.
Hence, doing this automatically means you’d have to write your own service that checks which nodes have which segments claimed, likely periodically, and make that service un-/assign segments as it sees fit.
Axon Framework does have the API available on the TrackingEventProcessor to perform this operation; hence why you can un-/assign segments at this moment in time through Axon Server.

  1. Replaying
    To be able to replay a giving TrackingEventProcessor, a single thread is required to have a claim on all the tokens for that processor.
    This is needed as the Tokens needs to be adjusted to ReplayTokens (read: tokens which know the previous state and replay state of that token).
    If you’ve thus distributed your application with several nodes having claims on tokens of a single TrackingEventProcessor, you will have to force all those nodes to shutdown processing for that Tracking Event Processor.
    Again, this will be a feature we’ll (very likely) introduce into Axon Server, but which is not yet in place.

Hoping to have provided you with the needed background Vilmos.
By the way, we’re aiming to release 4.1 of framework and server somewhere mid February.

Cheers,
Steven

Hi,

Thanks a lot!

Regads