Issues with Asynchronous Clusters on replay

We use Axon framework heavily on our event sourced back-end and we are seeing two issues related to replays. Wondering if anyone has a solution for this.

We have a scenario where users import zip files on the root aggregate. Projections of these import events can take a long time to finish depending on the size of the import. So we decided to use an Asynchronous Cluster with 10 executors and SequentialPerAggregatePolicy sequencing policy so long running imports don’t impact other aggregates. This works well on the live system.

We use the beforeReplay() method to set a flag to identify if the system is replaying or live; this flag gets unset in afterReplay() method. This flag is used to skip certain operations we do not want our event handlers to perform when replaying.

However, this is proving to be unreliable as Axon seems to call afterReplay() as soon as it publishes the event on replay without waiting for event handlers to complete; which causes our flag to unset. As a result, event handlers ends up performing those operations we wanted them to skip.

Is there a cleaner way to do this?

Second concern is, on replay there’s the possibility of Axon loading all the events from the event log on a replay. As event handling is lot slower, cluster will inevitably lag and potentially crash when we exceed the memory available. I’ve noticed that you mentioned here (https://groups.google.com/d/msg/axonframework/DCEwZYDfIPg/0PaYrV2ksWEJ) you will be looking at a solution where Axon will throttle replays automatically. Is this out yet?

Appreciate any advice

Hi,

I’ll have a look at the timing of the afterReplay. However, there can never be a guarantee about ordering of the invocation and invocation of the handlers, as invocations are happening concurrently.

You say that your handlers perform operations that you don’t want them to (during a replay). What kind of operations are these? You should separate the handlers that build projections from the handlers that cause side-effects, by assigning them to different Clusters. Only replay the clusters that don’t cause side-effects.

As for the throttling, there is currently no solution for this. I am planning to implement an Asynchronous cluster based on the Disruptor, which uses a circular buffer to store entries.

One way you could implement throttling, is by using a Semaphore. Wrap your asynchronous cluster and acquire a permit for each event coming in. The wrapping cluster should subscribe an EventProcessingMonitor, which releases a permit for each event successfully handled. Maybe not the most elegant solution, but it should work.

Cheers,

Allard

Hi Allard,

You say that your handlers perform operations that you don’t want them to (during a replay). What kind of operations are these?

  • These are essentially operations notifying users about various tasks (i.e. when zip imports are completed)

We will look at splitting the current cluster in to two. One for the live system which will remain as a AsyncCluster with SequentialPerAggregatePolicy. The other for replaying (without any side-effects i.e. user notifications). We will most likely keep the replaying cluster as a Simple cluster to avoid the need for throttling as replaying is not time-critical for us yet.

Thanks for the quick update.