Saga Support (issue #77)

JAmes_Atwill · August 16, 2010, 9:59pm

Hey,

I have the need for issue #77 and I'd like to take a stab at some
initial implementation. I like to reference Udi's idea of the
plumbing behind sagas:

http://www.udidahan.com/2009/04/20/saga-persistence-and-event-driven-architectures/

Axon would ideally provide:

1) a mechanism for persisting saga state
2) a mechanism for finding existing sagas based on events
3) a mechanism for ensuring that only one instance of a saga is
processing a triggering event
4) a mechanism to define timeout events to fire for a given saga

Udi takes advantage of C# generics which we can't do in Java (since
they're erased), so eventually we'll want to ensure all events are
handled properly at runtime.

1) State

Udi defines a SagaEntity interface which has a unique sagaId. Axon
would be responsible for setting the correct instance of a SagaEntity
on a saga.

2) Finding

I'm wondering if a Lookup class would have methods annotated with
@SagaFinder which would return an implementer of a SagaEntity;
something like "SagaEntity method(EventBase event)". An example would
be:

   @SagaFinder
   MoneyTransferSagaData findBy(FundsDeductedEvent event) {
     // but what goes here?
     return moneyTransferState;
   }

What I'm unsure of is how sagas should be persisted. Should it be one
table per type of saga?

3) Locking

If there is one table per type of saga, then each row returned would
represent an instance of a saga; in which case we can use database
locking. Otherwise more thought is required.

4) Timeout

I feel like this feature can come in after saga persistence and
locking is managed; but it's definitely needed.

Allard · August 17, 2010, 6:51am

Hi James,

of course, you are most welcome to create an initial implementation of Saga Support. I would even let you finish it, if you want

after your comments in issue 77, I finally god used to a Saga being an instance that follows progress around a specific process. So I would just use Saga instead of SagaEntity.

I think finding a saga should be Axon’s work. But since I don’t believe in magically appearing objects, we need to find them based on a correlation ID. This correlation ID should come from the events. On a PaymentReceived event, there may be an OrderId, which acts as the correlation ID for the OrderSaga to trigger. I like the annotation approach, since it matches the event handling. However, the actual method of extracting a correlation ID from an event should be abstracted away.

In many applications, all events linked to an order ID will have a getOrderID() method. That means the SagaFinder could be a simple configurable component that uses a reflection to figure out whether an event should trigger a saga. A new sage could then be created if none with that correlationID exists.

So, instead of returning a Saga instance, these methods will only need to return the correlationID (or multiple?) for each saga to trigger.

You could even think of 1 table for all saga’s. In the end, that’s also what I use in the event store. Locking could be done by creating a lock field. Most use cases I can think of for saga’s (and I have been mapping the concept to my project at JTeam) involve some third party integration. In such a case, optimistic locking is dangerous. So it is probably best to lock a saga using a certain field in the database. Check out the mechanism Quartz uses to prevent scheduled persisted jobs to be executed on more than one server in a cluster.

A very nice mechanism indeed. The only thing most likely needed is a mechanism that triggers a “TimeoutEvent” at a specified time. The correlation in that event should use the mechanisms above to trigger the correct saga. Maybe the implementation of the previous building blocks will provide some ideas for this one.

I think you forgot one, which I managed in the issue: the SagaManager. It acts as the mechanism that will trigger the whole saga load-invoke-persist process based in incoming events. It will use the SagaLookup/SagaFinder to get the correlation ID, the saga repository to load the appropriate saga(s), pass the events to the saga and persist the new saga state.

If you need any help, don’t hesitate to contact me.

Cheers,

Allard

JAmes_Atwill · August 18, 2010, 5:53am

Hey Allard,

I basically agree with everything you've posted; only the part about
correlation Ids still needs some flushing out.

If you're comfortable creating a base class for Commands where a list
(or map?) of Ids can live, then Sagas can attach their unique Id to
the command. Then perhaps the current correlation Ids are stored in a
ThreadLocal during execution. When an AR calls apply() with an Event,
something can lookup the Ids stored in the ThreadLocal and apply them
to the Event.

I *want* to think that calling .dispatch() when correlation Ids
already exist in ThreadLocal that they should be automatically applied
to commands, but I *feel* like that will lead to unintended side
effects.

Sagas then only need to register which events and correlation Ids
they're interested in to be triggered.

This works; but there's a race condition when two events which can
cause the creation of a saga infact create two sagas; I'm not yet sure
how other implementations handle this.

JAmes

Allard · August 19, 2010, 6:05pm

Hi James,

I’m afraid the abstract command with the correlation ID will not really help. It is not safe to assume that all events that are of interest to the saga come from a command executed by that same saga. In fact, I believe that in most cases, they will come from other contexts or even other applications/systems.

In the asynchronous event handler wrapper, I already have a mechanism to prevent race conditions. This is done based on a policy. Policy returns a specific value based on an incoming event. When two events generate an equal value, they are handled sequentially.

This mechanism would be very useful in the saga as well. If there is a mechanism that creates a correlation ID from an event, that correlation ID can be used to decide which saga to find, but also to make sure that events going to the same saga are treated one after the other. The problem is, what this lookup figures out that an event has a certain correlation ID may or may not trigger a new saga to be created?

So I tried to put our ideas together. Here is the flow:

A SagaManager (there only has to be 1 for several types of sagas) received all events
Each event is passed to a saga lookup instance. This saga lookup may return one or more saga’s that the event should go to. These may be either existing saga’s loaded from a repository, or newly created ones.
For each saga returned, the SagaManager will apply the event to it
Each saga is then persisted back (using the same mechanism that stores aggregates, using the UnitOfWork)

Now, for the SagaLookup, there could be different (abstract) implementations for certain scenarios. My “extract a correlation ID and the rest is automatic” solution could be one of those abstract implementations.

How do you think about this?

Cheers,

Allard

JAmes_Atwill · August 23, 2010, 9:40pm

I feel either way there's a need for correlation ids as metadata on
Commands and Events, any correlation id's active during a command
should make their way through to the Events.

- A SagaManager (there only has to be 1 for several types of sagas) received
all events

Am I correct in understanding that a SagaManager would have which
sagas its managing
configured into it (and from that, determine which Events it is
interested in)?

- Each event is passed to a saga lookup instance. This saga lookup may
return one or more saga's that the event should go to. These may be either
existing saga's loaded from a repository, or newly created ones.

If each different type of Saga has its own SagaLookup, a lookup will
either return one Saga, or null. If a Saga indicates that it's started by
that type of event and the lookup returns null, then it's safe to
create a new one; this means
quite a number of queries on a busy system though, so if a SagaManager can claim
ownership of a subset of Sagas, it should in theory be able to cache
saga state effectively.

- Each saga is then persisted back (using the same mechanism that stores
aggregates, using the UnitOfWork)

That's an interesting idea; can SagaLookup work against an XStream repository?

Ping me on Google Talk when you have some time and we can work through
some of these things.

Take care!

JAmes

Allard · August 28, 2010, 8:43am

Hi James,

my apologies for this extremely late response. It has been hectic times. We had a go-live this week for quite a big CQRS-based (of course) project. And yesterday I gave a CQRS and Axon workshop.

I do not disagree with the idea of having a correlation ID on commands. I would, however, see it as a different feature, separate from saga’s. Probably the best way to add this feature to a command is by using an interface “CorrelatedCommand”, which exposes a single method: “UUID getCorrelationId()”.

My understanding of the saga manager (or maybe saga controller is a better name) is quite configuration-less. All it knows of, is a SagaLookup (or maybe multiple, one per type of saga). For each event receives, it will ask each of the saga lookups for relevant saga instances. The manager will then pass the event to each of the returned sagas.

I think creating a GenericSagaLookup is quite easy. The saga lookup could be configured with a list of events and the names of the properties that contain the ID of the saga to return. For example, you could configure “OrderCreatedEvent” and “orderId” to indicate that when an order is created, the saga with the OrderId should be brought to life (or created). If more complex logic is required, one can always build a different implementation of the saga lookup.

A saga-repository will take care of the actual storage of saga’s. It’s probably very similar to the repository for aggregates. Storing saga’s using XStream as you mention is probably the best way. It will prety cleanly store the aggregate’s state.

Your solution of saga lookups only working for a subset of saga’s can be a pretty good solution when scaling out. All you need is a decision strategy to decide which instance will create a saga. From then on, machines are responsible for maintaining that saga.

And we shouldn’t forget one very important feature of sagas: their ability to react on time-based triggers. Maybe the sagamanager is a good entry point for saga to set an alarm clock? The saga manager could pass itself to the @EventHandler method on the saga as the optional second parameter (the mechanism for that is already present).

The best way to get a good API on this one, I think, is to try, throw away, try again, etc.

Cheers,

Allard