a clustered environment doesn’t have to be a bad thing, per-se. You don’t even need a distributed command bus in all cases. Depending on the type of application you’re working on, and the level of concurrency you expect on your aggregates, it sometimes suffices to use a SimpleCommandBus and SimpleEventBus. Obviously, you run the risk of concurrent modifications when two aggregates are accessed on different nodes at the same time.
Using a DistributedCommandBus may help here when the chance of concurrent access to aggregates is too big to do an occasional retry. It will route commands consistently to the same machine. The sender of a command will always receive a notification via the callback. So if a machine fails while handling a command, the sender of the command (assuming that’s not on the failing machine) will receive a notification that the node handling the command has failed. Depending on the idempotence of the command, you can retry automatically or notify the user. By handling commands transactionally (which is usually the case), you ensure that either the command has been processed fully, or not at all.
With regards to event handling, you should ensure that at most one instance of the same cluster (cluster as in group of event listeners) is listening to a specific queue. Or at least, make sure that each event is delivered to at most one consumer (which is possible using JMS topics). In the AMQP connector in Axon, I allow for exclusive connections, which means that at most one cluster can be connected to a queue at any time. If that connection drops, the other node will automatically take over. I am not sure if such thing is possible in JMS.
Why would you want to implement a UnitOfWork based on InheritedThreadLocal? Spreading a UnitOfWork over multiple threads sounds like a dangerous thing to do. It’s not built for concurrent access. If you need to be able to confirm/acknowledge messages on the JMS queue, you can use the EventProcessingMonitor, which is new in Axon 2.1.
Hope this helps.