Saga sequential execution across multiple nodes in 4.0?

The docs are a little unclear to me regarding what happens when you have a saga associated with a multi-segment tracking event processor and the application is running on multiple nodes. Can someone clarify how this works in current versions of Axon?

My understanding:

1. On any single node, two event processing threads can never execute the same saga instance at the same time even if the saga is listening for events that appear in multiple segments.
2. In a multi-node setup, each node claims a subset of the segments. The specific subset can change over time, but at any given time, each segment is tracked by exactly one node.

But what isn't clear to me is whether there is communication between nodes that would prevent the same saga instance from executing concurrently on two nodes if there are events in multiple segments that need to be handled by the same saga. Is there a distributed locking mechanism or some other guarantee of sequential event handling across multiple nodes?

In Axon 2.x, there wasn't, and I ended up modifying the saga management code to acquire database locks to prevent concurrent execution.

Thanks!

-Steve

Hi Steve,

I think I am able to shed some light on your situation.
As off Axon 3.x, the SagaManager, which is in charge of the technical aspect of loading the right Saga instance and giving it it’s events, uses a SagaRepository implementation to load the Saga instances.
The SagaRepository implementation which is used by default, the AnnotatedSagaRepository, inherits from the LockingSagaRepository.

This LockingSagaRepository in turn uses a lock factory to lock a specific saga instance from being loaded currently into two distinct threads.
The default LockingFactory used is a PessimisticLockFactory, which uses a pessimistic locking strategy.
You could adjust the strategy by providing your own AnnotatedSagaRepository bean in your environment with a different strategy, if necessary of course.

Hope this helps you out Steve!

Cheers,
Steven

Hi Steven^2,

not entirely accurate, actually.
Events are assigned to processors using a SequencingPolicy. However, for Sagas this doesn’t hold.
The SagaManager doesn’t partition events into segments, but rather partitions the Sagas themselves. So, a SagaManager will handle all events (relevant for the Saga), but will only invoke the Sagas if it’s ID falls in the segment being handled by that thread.

That’s how Axon willl ensure each event will eventually be handled once by each Saga.

Cheers,

Allard

Sweet, still stuff for me to learn too.
Thanks for pointing that out Allard!

Thanks! Does that work for events that aren’t persisted to the event store (filtered out by a FilteringEventStorageEngine)? Presumably such events would need to be delivered to each remote node that has a saga that’s associated with the event, since their tracking event processors wouldn’t encounter it.

-Steve

Hi Steve,

The FilteringEventStorageEngine would as you’ve guessed correctly filter events prior to appending them to the event store.

Thus simply put, they’re not stored.
That also means that if you’d use a TrackingEventProcessor for your Saga instances, that those instances will not see these filtered events, as the TrackingEventProcessor can only supply events to event handlers (like those in your Saga) which come from a StreamableMessageSource.
The EventStore implements the StreamableMessageSource reads a stream of events and for that these have to actually be stored somewhere.

Hope this helps!

Cheers,
Steven

Thanks. So to support delivering events to sagas in the presence of a FilteringEventStorageEngine (in other words, in a system where some events are ephemeral and not intended to be persisted long term) I’d need to use a SubscribingEventProcessor, correct? If I’m using a SubscribingEventProcessor, is there a mechanism to prevent the same saga from executing concurrently on two nodes?

-Steve

Hi Steve,

Correct assumption there, you’ll have to use the SubscribingEventProcessor if the Saga instances need to handle events which are not stored in the Event Store.

There is however no mechanism to prevent Saga from executing concurrently between two nodes if the SagaManager receives it’s events from a SubscribingEventProcessor.
The segmentation Allard was talking about for a SubscribingEventProcessor is a default ‘all-event-segment’, as SubscribingEventProcessors aren’t capable of being aware of other Event Processor they need to share the stream with like TrackingEventProcessors do.

Then again, an event you filter out which needs to be handled by a Saga, will not end up on both nodes to begin with.
So the concurrency issue would only arise if you have several events in quick successions to one another on both nodes for a single Saga instance.

Tough situation to resolve…with a couple of things I can think off.

Does this shed enough light on the process underneath Steve, or are there more follow up questions you’d like to pose?

Cheers,
Steven

Thanks. That was the behavior in Axon 2.4, and there, I solved it by modifying the saga manager to do SELECT FOR UPDATE on the SagaEntry table, such that the database would ensure only one host could load a particular saga at a time.

My change was merged into the main Axon repository as commit 3c260e57 (and related commits a9bb48f1, 3f30f696, and ed83eeb3) but at that point 3.0 development was already far enough along, and the saga handling code was different enough in 3.0, that I assume it didn’t make any sense to try to port my change over.

Perhaps the same general approach would work in 4.0, assuming the data store has a locking mechanism with semantics similar to PostgreSQL’s. Does that seem right, or are there structural changes between 2.4 and 4.0 that’d make that approach fail?

-Steve

Hi Steve,

I think this is definitely a worthwhile feature to introduce in Axon 4.
In our update from 3 to 4, we’ve introduced the Builder pattern in all our infra structure components.
This allows us to quite easily introduce ‘switchable’ features like this so that we stay backwards compatible but do allow an easy way to adjust this.

I’ve created an issue for this feature, which you can find here.

Cheers,
Steven