Looking at the source code it appears that under certain failure scenarios events would be lost even though aggregate has been successfully persisted. The case in mind:
- A command is issued which modifies an AggregateRoot instance.
- The aggregate root captures the state changes and dispatches events.
- The underlying Repository registers its savecallback with UnitOfWork.
- The dispatched events are kept “in memory” by the UnitOfWork via the EventContainer callback.
- UnitOfWork is finally committed which does following things in order:
a. It calls the repository via savecallback to persist the aggregate to the data store (which can be transactional)
b. It dispatches all the messages on the event bus which the repository provided
Now, if the process were to crash right after 5a but before 5b then all those messages will be lost.
I don’t see any logic to re-dispatch those messages when the process comes back up.
Note that our plan is to use traditional JPA based single state RDBMS and NOT use the event sourcing persistence mechanism. But, maybe there is a way to replay messages by using HybridJPARepository (event store will have the messages)?
Axon is really cool framework and we are looking for suggestions about the right configuration/approach to achieve fault tolerance in this case as we can’t afford to loose business critical messages.
Regards,
Aditya