Distributed saga manager


I’m exploring Axon for a project. I’ve probably got many more basic questions than this, but I was reading some old threads [1] and thought I’d dive right into the deep end with this one, since I think it’s relevant to my architecture.

In Axon 2.0 are you expecting at most a single JVM (in a cluster) to be assigned to the handling of events for a single saga type?

Have you put any more thought into a distributed saga manager? (A hint as to how it could work, if not the implementation, might be helpful to me.)

With a distributed command bus, even with a distributed event bus configuration, I’m confused how saga processing could be horizontally scaled, let alone combined with caching, without running into concurrency issues.

Peter Davis

[1] https://groups.google.com/forum/#!msg/axonframework/h7T60nQNEMw/Qt31PXI_GK0J

Hi Peter,

distributing sagas is a lot different from distributing ‘regular’ event handlers. The biggest challenge is to make sure that any sequence of events is properly handled. An event N in the sequence could change the association values so that event N+1 is now ‘of interest’ to the saga.

I’ve been thinking about partitioning the sagas so that each server would process a portion of the sagas. When creating sagas, the machines should decide among themselves which one is responsible for creating it. In some cases, a saga should only be created when no saga is already associated with the event. In that cases, machines would need to ‘vote’ before creating a saga.

To decide which machine should create a saga instance, you could use consistent hashing of the event identifier that triggers the saga creation. That would minimize the need for chatter between the systems. For the other events (those that execute on existing saga instances), each machine would just process events on their portion of the saga population.

I am sure it is possible, but its not straightforward and haven’t found the time to really think the several possible strategies through. If you any ideas, I’d love to hear about them.



Hi Allard,

some time passed since that question regarding sagas and distribution. What is the situation nowadays with v2.4.4 (als 3.x on it’s way)?

How does your axon setup with redudant nodes typically look like, when sagas are involved?

Personally, I can think about two options, but my imagination might be limited:

  1. Have multiple axon instances running side-by-side (for HA, each having a a non-distributed command bus, sagas, listener), and failover to another instance in case of error. That means sagas are always active in one instance, since only one is running. Like the LMAX architecture.

  2. Extract all sagas into an own application, run only one instance of that application, and use a event-bus-terminal and route all events to this one instance

While 1) works (it need leader election, but that’s another story), it does not help when we hit performance limits of one instance.
And 2) has a single-point-of failure.

Any hint is really appreciated.


distributing sagas is not yet possible (out of the box). In Axon 3, the necessary ingredients to make it easier to implement will be so available, so we will start soon after 3.0 has been released.
Until then, you can have a component read events from a queue (AMQP, for example) in exclusive mode. That will ensure that only one of the components (regardless whether it is deployed separately or as part of an application) is ‘active’ at any point in time. Note that this is exactly what the ClusteringEventBus and Clusters are for…



Hi Allard,

sorry for the late reply. Thank you for your guidance and the hint for the ClusteringEventBus. We now decided to go with the exclusive mode.