Perfomance tuning initial load from large context

Maarten_De_Cock · April 30, 2020, 9:39am

Situation:

We have an existing integration context of about 50 million events.
Our new application has an event listener that will start processing those 50 mil events to create and update aggregates in it’s own context.

The simple implementation take about 3 days to process those 50 mil events.

@Eventhandler
public void on(IntegrationEvent e) {
commandGateway.sendAndWait(toLocalCommand(e));
}

I’ve implemented the same solutions as presented in CQRS replay performance tuning (8 segments, batch size 500, use unitOfWork to combine multiple events into one command)

To send the commands I have 2 options and each comes with it’s own problems:

commandGateway.send:

This processes the events really fast but we are getting a lot of command timeouts in our log.
Even an hour after stopping the eventprocessor using axonserver dashboard, we still see command beeing executed.
I think a timeout on the sending side does not mean the command will not be executed?
I suspect axonserver is using some kind of queue to store the commands?
Is it possible we send too many commands for axonserver to process? Or will axonserver try to execute all commands it has received?

commandGateway.sendAndWait:

Processing of events is slower and segments are getting claimed by other threads because processing of one batch takes too long. (so we end up with temporary duplicate segment claims)
This means we have to tweak batch size or maybe claimTimeout of the tokenstore?

Are there suggestions for perfromance tuning this use case?

Thanks,
M

Maarten_De_Cock · April 30, 2020, 10:57am

Maybe an interesting addendum: processing those 50mil events will create about 5mil aggregates

allardbz · May 4, 2020, 8:31am

Hi Maarten,

when using commandGateway.send(), you’re basically sending commands without any backpressure. At the moment, AxonServer will accept all commands and try to send them downstream. It is likely that the downstream process cannot handle them at the speed they are delivered, causing the queue to fill up in AxonServer.

One thing you could do is use a Semaphore to limit the number of commands in transit. It should be faster than just sendAndWait, because multiple commands can be executed in parallel on the receiving side, but it should prevent all messages to be queued.

The approach would be:
// initialize the semaphore using a “reasonable” value. A few thousand shouldn’t be a problem.
Semaphore semaphore = new Semaphore(5000);

// in the event processing loop:
semaphore.acquire();
CompletableFuture<?> result = commandGateway.send(command);
result.whenComplete((result, exception) -> semaphore.release());

In upcoming releases, we are planning to improve the backpressure mechanisms for clients sending commands (and queries, for that matter), so that sending slows down when consumers are saturated.

Hope this helps.
Cheers,