Deadlock with synchronized Sagas + Quartz

Steven_Grimm · February 25, 2016, 6:02am

When I throw a bunch of load at my system, I'm seeing deadlocks involving the Quartz TRIGGER_ACCESS lock in the database and the lock on a Saga instance that implements synchronized event handling. The actual execution flow in my application is more complicated than this, but here's what I think is happening. I'm using PostgreSQL and synchronous event and command delivery on a single node. All the interaction with Quartz is via an Axon EventScheduler instance.

Thread A: Request handler starts a UnitOfWork
Thread A: Event is published
Thread A: Request handler commits its UnitOfWork
Thread A: Event is handled by Saga 1
Thread A: Saga 1 schedules an event; the Quartz TRIGGER_ACCESS lock is acquired
Thread A: Saga 1 sends a command which ends up causing another event to be published

Thread B: Some other event is handled by Saga 2 in a different UnitOfWork
Thread B: Saga 2 schedules an event; Quartz attempts to lock TRIGGER_ACCESS but blocks because it's held by thread A's transaction

Thread A: The event from the command handler needs to be handled by Saga 2; Axon tries to acquire Saga 2's lock but blocks because the lock is held by thread B

Then additional threads block on the two locks in question and the application grinds to a halt.

When I was using asynchronous Saga event delivery this wasn't an issue, possibly because event handlers don't share transactions. Once the next Axon release comes out with the fix for connection management in async Sagas, I can switch back to asynchronous mode, but it'd be nice to figure out how to get the application to work properly in either mode.

Hopefully that analysis is correct. Like I mentioned, I simplified the execution flow here but I believe I captured the essence of it. As always, it's totally possible I'm just doing something dumb.

Thanks!

-Steve

Allard · February 25, 2016, 11:06am

Hi Steven,

unfortunately, this is a risk with the SimpleEventBus and SimpleEventBus that is hard to solve. Normally, Axon will detect deadlocks if they occur between threads that get a deadlock on one of the Axon locks. However, since Quartz is involved in the lock, Axon doesn’t detect it.

Instead of Asynchronous Saga delivery, you can also consider using the AsynchronousCommandBus. That should work around the issue as well.
In the meantime, we’re working on the issue and a release.

Cheers,

Allard

Steven_Grimm · February 25, 2016, 11:06pm

AsynchronousCommandBus seems to have done the trick. I’d been looking at DisruptorCommandBus previously but it didn’t play nicely with the non-aggregate-based command handlers in my app. Thanks!

Can I suggest adding a paragraph or two about AsynchronousCommandBus to the documentation? I admit I didn’t know it existed before this conversation.

-Steve