For our system, if an EventListener fails and eventually results in a RetryPolicy.skip(), we mark the AR instance in question as “broken” and stop it from accepting new commands until someone resolves the failure. Our EventListeners also know not to persist their changes to their read models until told to “synchronize”, so on failure there will be non persisted changes.
In order to resynchronize the EventListeners, all unpersisted read model events will be loaded from the EventStore, played through the EventBus, then the EventListeners will be told to persist their changes. While this resynchronize process is going on, I want incoming events to be queued up, and once resynchronized, allow them through.
This is very similar to the ReplayingCluster / ReplayAware functionality, except on a per-AR instance basis.
Has anyone tried anything similar? If so, did you try to extend the ReplayingCluster, or write your own handling?
JAmes