Thoughts on replaying a single AR Instance?

JAmes_Atwill · May 24, 2013, 8:23pm

For our system, if an EventListener fails and eventually results in a RetryPolicy.skip(), we mark the AR instance in question as “broken” and stop it from accepting new commands until someone resolves the failure. Our EventListeners also know not to persist their changes to their read models until told to “synchronize”, so on failure there will be non persisted changes.

In order to resynchronize the EventListeners, all unpersisted read model events will be loaded from the EventStore, played through the EventBus, then the EventListeners will be told to persist their changes. While this resynchronize process is going on, I want incoming events to be queued up, and once resynchronized, allow them through.

This is very similar to the ReplayingCluster / ReplayAware functionality, except on a per-AR instance basis.

Has anyone tried anything similar? If so, did you try to extend the ReplayingCluster, or write your own handling?

JAmes

Allard · May 25, 2013, 10:53am

Hi James,

just out of curiosity, what would a reason for an event handler to fail?

Another solution direction could be to use a discriminating persistent queue. Each entry would have a “sequencing value” (the one used by the async cluster). The queue would also need to know about “committing” an instance. When an instance is “taken” from the queue, you first start evaluating the first entry. If that is already taken (but not committed) by another thread, you remember that entry’s sequence identifier, and move to the next. As long as the next entry is either “taken” or has a sequence number equals to a previous item, you skip it.
You can also add a “blacklist” of items that must be skipped, regardless of the items being “checked out” or not. That would allow you to implement your “hold” scenarios.
Currently, I am looking into MapDB for another feature in Axon. It makes it very easy to create a memory mapped/persistent collections. The performance is also very good.

Obviously, you could also implement this logic by fetching items from the event store. I just wanted to shed some light on another solution so you can see which would fit your problem/challenge best.

Cheers,

Allard

JAmes_Atwill · May 30, 2013, 8:01pm

Hey Allard,

A handler could fail because a resource it needs is unavailable, or because of a coding mistake or any number of other issues.

We decided against persistent queues for the EventBus in general since it would mean more storage mechanisms in production (and more challenges sharding them to scale out) and to do the holding I’ve implemented an EventBusTerminal that knows how to syphon events based on rules.

I’m not sure I fully understand the description you’ve got; are you describing a topic instead of a queue?

JAmes

Allard · June 3, 2013, 5:30pm

Hi James,

Just wanted to know what kind of failures you meant. Sometimes, people see coding errors as handler failures.

with Queue, I meant the java interface. It’s possible to create one that doesn’t simply returns the first available item, but uses a number of additional requirements before an item is taken from it.

But probably the easiest solution for you would be to read a stream from the event store. The event store is optimized to read based on the aggregate identifier anyway. If you remember the last sequence number of each stream, and keep track of ‘blacklisted’ aggregates, it shouldn’t be too hard to implement this.

Cheers,

Allard