Replay and cluster best practices

Hi there,

I’m currently working on replays. I have some questions concerning best practices.

As the replay mechanism as is, uses clusters directly, I was wondering: Do I need to “lock” the command bus during a replay? What happens if a replay is in progress and a command causes an event targeted at a replaying cluster. Imagine for example an edit style event. If the entity was not inserted in the according database table by the replay yet - we have to delete the table contents at the start of a replay - the update might just fail. Or in another scenario the update might be overwritten by a replayed event later on. So as a best practice, should I reject commands to the command bus which might cause events during replay?

Another question: What is the best type of cluster to use for replaying? I’m using right now an AsynchronousCluster with a Spring TaskExecutor with defined number of threads and a CALLER_RUNS policy for the cluster which consists of replay enabled database table managers. I thought this might be a good idea to speed up things. However, I read!searchin/axonframework/replay/axonframework/DCEwZYDfIPg/tF8s5PTzOgMJ that there are issues regarding throttling. By using the Spring’s TaskExecutor I think, that I don’t have throttling issues here - at least I didn’t notice any. Allard said he uses the async cluster for external integration stuff. This makes me believe that it is encouraged to use a SimpleCluster for “core” event handlers - both in replay and in normal operations? What are best practices here?

Furthermore I would like to know if a defined replaying cluster should only be used in the context of replays or if those can be identical to those I hook up to a clustered event bus.



Hi Sebastian,

the ReplayingCluster takes care of the side-effects of using the application while performing a replay. Basically, you can maintain a backlog of “live” events while performing the replay. When the replay is finished, any events left in the backlog are processed. When the backlog is empty, the cluster switches to live mode again.

It doesn’t really matter what type of cluster you use for replaying. The only requirement is that it is wrapped in a ReplayingCluster, for the backlog to work. Depending on the type of work you need to do, an async cluster could improve processing speed, as it allows events to be processed in parallel.

In a properly designed system, all Clusters can be asynchronous. It’s a combination of consistency vs (perceived) performance that you need to find a good balance for. If a cluster only contains handlers that are safe to replay (sending emails and external system integration is generally not safe), wrap it in a ReplayingCluster. That implementation is safe to use in normal operation, too.



Thanks Allard for your quick response and keep up the good work!