I turned off async saga event handling pending a fix for the open-connection problem I reported earlier. But now I’m seeing a different problem.
One of my sagas has a collection of pending interactions with an external system. When one of the interactions finishes, an event is published. The saga handles that event by cleaning up some internal state that tracked the interaction and sending a command to an event-sourced aggregate, which publishes a second event. That second event is consumed both by the aggregate and by the same saga that sent the command.
What I’m seeing is that the saga manager isn’t reusing the existing saga object to handle the second event; instead it loads a fresh copy of the saga from the repository. Unfortunately, in the non-async world, at that point the changes the original saga made to its internal state haven’t been committed to the repository yet, so handling of the second event ends up going wrong because it’s delivered to a saga instance whose state still indicates that the interaction was pending.
It looks like this doesn’t happen in cases where the first event caused the saga to be created, because there’s special handling for newly-created sagas in AbstractSagaManager.loadAndInvoke() and the existing object is reused. But in this case the saga already exists, so it’s loaded from scratch for each event.
In concrete terms, when I put a breakpoint in the handler for the second event and walk up the call stack, I see that the saga instance at the bottom of the stack is a different Java object than the one at the top of the stack; they have the same identifier but different state.
As usual, if this is caused by me using Axon wrong rather than by a bug, happy to adjust my application code.
-Steve