Deadlock between Aggregate and Saga

Peter_Davis · June 14, 2013, 8:48pm

Hello all,

I have an aggregate and a saga. The saga collects info from the aggregate via events and then schedules a Quartz job, and also informs the aggregate about its status via commands. (The Quartz job triggers the saga to interact with an external system, but I don’t think that’s relevant here.)

The deadlock is between these two threads:

[user] -> Command -> Aggregate* -> Domain Event -> Saga* -> EventScheduler
[Quartz job] -> Event -> Saga* -> Command -> Aggregate*

= IdentifierBasedLock.obtainLock() is called at these points

As you can see, we have two locks acquired in different orders.

I think IdentifierBasedLock would detect the detect and throw a DeadlockException, except that my CachingEventSourcingRepository (hence PessimisticLockStrategy) and AbstractSagaManager have different IdentifierBasedLock instances, so each is not aware of the other’s locks’ owning threads.

Would it be possible to change IdentifierBasedLock.locks hashmap to be ‘static’? Or make it possible to wire the same IdentifierBasedLock into both places?

Other options I see:

I’m reluctant to use an async saga manager because of its non-persistence in case of server shutdown.
I think this would be fixed by using an AsynchronousCommandBus, which is a possibility, but also kind of a scary change since it means no more nested units of work, and I’d need to be careful about timeouts…I’d have to analyze the whole application to see how that might affect things.
Disabling ‘synchronizedSagaAccess’ probably doesn’t help because I’d have to litter my saga with ‘synchronized’ blocks, leading to the same potential deadlock.

(Note, I found another lock in CachingSagaRepository.associationsCacheLock…It looks like this one is not subject to deadlock because it never calls outside from within the critical section, but it’s worth considering too.)

I can provide a thread dump if needed.

I found a couple threads in this group about deadlocks but they didn’t seem related.

https://groups.google.com/d/msg/axonframework/j9uxqz0Jsfc/x_a_0hpcWfYJ was due to a single command modifying (and locking) multiple aggregates – not the case here
https://groups.google.com/d/msg/axonframework/ZBP1yQZaPOQ/VGZn_H7Ycd0J was due to a DB locking issue

Sincerely,
Peter Davis

Allard · June 15, 2013, 1:03pm

Hi Peter,

you are right about this one. The current setup with separate IdentifierBasedLock instances isn’t very useful. I’ll redesign this a little to make it work better.

A workaround for now could be to explicitly configure a locking-strategy on your repositories. If you inject the same strategy instance in all of them, they will share a lock. Unfortunately, that’s not possible for the SagaManager (yet).

Cheers,

Allard

Allard · June 15, 2013, 2:10pm

I have created issue 151 (http://issues.axonframework.org/youtrack/issue/AXON-151) to track this one.

Cheers,

Allard

Peter_Davis · June 16, 2013, 3:56am

Unfortunately the deadlock we have seen is with SagaManager. We’ll keep an eye on it. Thanks for filing the issue, and for all your great work.

-Peter

Allard · June 16, 2013, 1:29pm

Hi,

the issue has been fixed in 2.0.3 (snapshot). Deadlocks are detected across instances now.

Cheers,

Allard