All commands for a specific aggregate result in a LockAcquisitionFailedException
I’ve tried restarting the applications, and the issue persists
Configuration:
Spring + JPA
MySql
HikariCP
Event-Sourced Aggregates
Asynchronous Command Bus
IntervalRetryScheduler
TrackingEventProcessors
All configuration is “out-of-the-box” with the exception of some simple logging interceptors and correlation data providers
This is a multi-node deployment that does not distribute commands. Throughput is fairly low, so we’ve been using a RetryScheduler to handle ConcurrencyExceptions, which have been very infrequent.
This solution has been working for several months without incident. Starting 2 days ago, all commands for a certain aggregate are throwing
LockAcquisitionFailedException(Failed to acquire lock for aggregate identifier(SOME IDENTIFIER), maximum attempts exceeded (2147483647))
Commands for this aggregate can be emitted by 2 Sagas - both backed by tracking event processors. There are a few relevant warnings and errors coming from those Sagas
SQL Error: 1205, SQLState: 40001
Lock wait timeout exceeded; try restarting transaction
Looking at token_entry table, I can see that this issue is present even when those tracking event processors are on the same node
The DeadlineManager is used extensively. We schedule certain deadlines based on timestamps provided to us in a REST controller. Many of those timestamps are exactly identical, which could trigger many events simultaneously. Each of those events are handled by a saga and issue a command to this type of Aggregate. So, since we are using the AsynchronousCommandBus, those commands are “fire and forget”, so they could operate concurrently on the same aggregate. But, event when I call .join() to make the saga wait on the command execution, the issue still persists.
From what I’ve read in the Google group, it is obviously better to distribute commands. But I was also under the impression that if one can live with managing some ConcurrencyExceptions, then not distributing the commands would be acceptable. Any insight into this issue would be greatly appreciated!