Consistency for event processing - event feeds and polling

Alza · September 25, 2012, 11:38am

Hi there,

I have a few questions concerning consistency for event processing, both synchronous and asychronous. I have only skim read the reference guide (v2) so apologies if I have missed something from there.

What is the scope of a ‘unit of work’ with respect to consistency; does it include synchronous event handlers. i.e. if a synchronous event handler doesn’t get called due to a power failure, does the unit of work not get committed? What happens if some synchronous event handlers are called successfully, but some other handlers are not (e.g. due to a power failure)?
Would it be possible to have an event bus implementation that polls from an ‘event feed’ for all subscribed event handlers? My reasoning is that I was already considering using a polling mechanism for external subscribers (i.e. other applications, whether in the same JVM or not) using the atom feed technique suggested in ‘REST in Practice’ (chapter 7/page 189-218, ‘Using Atom for Event-Driven Systems’). The benefits of this approach that are important to me are

events processed exactly once
events processed in order
these two guarantees can be realised using a web-oriented approach that doesn’t require complex message-based middleware. I am not concerned about latency in my case.

So, I am considering the viability of a polling mechanism that would work for internal subscribers in pretty much the same way as for external subscribers, to realise the above benefits mentioned above for both types of subscriber. What do I mean by internal subscribers? Well, this could be a different aggregate in the same application as the aggregate that applied the event for example.

In fact, I think both of my above questions are linked; in summary I can see how a simple ‘event feed’ polling approach could give exactly once/in order guarantees for external subscribers without introducing complex message-based middleware, but I am struggling to see how these guarantees could be ensured for internal subscribers.

Any ideas and/or implementation suggestions for this kind of ‘event feed’ polling would be appreciated.

Thanks,

Alex.

Allard · September 25, 2012, 6:10pm

Hi Alex,

The Unit of Work is not really a transaction in the sense that it guarantees an atomic operation. It takes care of staging all events generated in the domain model and postpones publication until the Unit of Work is committed. It is possible to attach a real transaction to the Unit of Work that is committed when the unit of work is committed, or rolled back when the uow is rolled back. What happens on a power failure really depends on the kind of transaction that you’re attaching to the UoW.
Theoretically, this should be possible. You could build an Event Bus that ignores events that are sent on it and regularly polls the Event Store for new events. Practically, there are a few problems/challenges: Events are not always fully sequential. It is technically possible that events with a higher timestamp (generated later) are stored before events with a lower timestamp (generated earlier). In the end, it is the operating system that decides which Thread is given CPU time. It isn’t hard to create a workaround and use the insertion moment to guarantee sequence. That is, until your infrastructure requires more than one Event Store instance.
That’s why, in the end, I’ve chosen to implement the UoW to take care of the “store and dispatch” process.
Another “issue” with this approach is that you assume that only events stored in the Event Store are ever dispatched. In some cases, you want to dispatch events that are not stored in an event store. Keep in mind that you do not always need to event source all entities in your domain model.

Hope this clarifies things a bit. If you have any ideas on how to circumvent some of these problems, I’m eager to learn about them.

Cheers,

Allard

Alza · September 26, 2012, 10:58am

Hi there thanks for your reply; I will come back to my second question in a subsequent post, since there’s a lot to digest!

Regarding my first question, since I am new to the framework it would be useful to have some clarification.

For example, using the example of the trader application, using axon 2.0-m1, it seems to be using the simple event bus, with listeners for the query model in the same JVM (please correct me if I’m wrong

With this event bus, how can we guarantee that the listeners for the query model will handle the event at some point, (using my example of a power failure) in order that they would become consistent?

Even if I attach a transaction to the unit of work, this is not going to make the persistence of the event and the updating of the query model atomic, since by design they are free to use completely different persistence mechanisms (I am deliberately ignoring distributed transactions here). Clearly this would not be a good idea in a lot of cases for scalability reasons, even if it were technically possible.

Or, is it the case that the only way to get this consistency guarantee ‘out of the box’ is to use the clustering event bus and asynchronous cluster, even for a listener in the same JVM ? I see there are some configuration options for retries, etc… I am right in thinking that these relate to achieving consistency?

Thanks

Alex

Allard · September 26, 2012, 2:25pm

Hi Alex,

with the SimpleCommandBus, events are dispatched in thread that publishes them. If there is a transaction bound to the Unit of Work, that transaction defines the atomicity of all the changes made by the handler. If the updates by the event listeners are done using the same persistence mechanism as the event store uses, all updates (including the persistence of events) is atomic. Once you start using asynchronous event handlers, you loose this guarantee. That is, unless this asynchronous process can hook into the transaction somehow. If you use Spring AMQP, for example, you can dispatch transactionally and have a transaction manager to commit both the database transaction (with the events being stored) and the AMQP transaction for publishing the message at (nearly) the same time. The window for power outages screwing up your state is exceptionally small.

So if you have a simple application setup, with synchronous event dispatching, it is not hard to guarantee consistency. If you want to process asynchronously, you just need to have the dispatching of the events participate in the same transaction.

The retry mechanisms in the async cluster are used when transient exception occur. For example, when a database is down, it waits until the database is up before continuing. But when a non-transient failure occurs (e.g. programming errors), it skips the event and continues processing with the next. But in the case of a power outage, all stages events are lost in this type of cluster.

Does this clarify things?

Cheers,

Allard