When I throw a bunch of load at my system, I'm seeing deadlocks involving the Quartz TRIGGER_ACCESS lock in the database and the lock on a Saga instance that implements synchronized event handling. The actual execution flow in my application is more complicated than this, but here's what I think is happening. I'm using PostgreSQL and synchronous event and command delivery on a single node. All the interaction with Quartz is via an Axon EventScheduler instance.
Thread A: Request handler starts a UnitOfWork
Thread A: Event is published
Thread A: Request handler commits its UnitOfWork
Thread A: Event is handled by Saga 1
Thread A: Saga 1 schedules an event; the Quartz TRIGGER_ACCESS lock is acquired
Thread A: Saga 1 sends a command which ends up causing another event to be published
Thread B: Some other event is handled by Saga 2 in a different UnitOfWork
Thread B: Saga 2 schedules an event; Quartz attempts to lock TRIGGER_ACCESS but blocks because it's held by thread A's transaction
Thread A: The event from the command handler needs to be handled by Saga 2; Axon tries to acquire Saga 2's lock but blocks because the lock is held by thread B
Then additional threads block on the two locks in question and the application grinds to a halt.
When I was using asynchronous Saga event delivery this wasn't an issue, possibly because event handlers don't share transactions. Once the next Axon release comes out with the fix for connection management in async Sagas, I can switch back to asynchronous mode, but it'd be nice to figure out how to get the application to work properly in either mode.
Hopefully that analysis is correct. Like I mentioned, I simplified the execution flow here but I believe I captured the essence of it. As always, it's totally possible I'm just doing something dumb.