Even when no requests are active, PostgreSQL shows “idle in transaction” status for almost all the connections in my connection pool, meaning there is no query currently running but the connections have open transactions. A thread dump confirms that no threads are in code that is actively using a connection (they are all waiting for work). I’m using an async saga manager with a Spring transaction manager and a ScheduledThreadPoolExecutor with a core pool size greater than the saga manager’s processor count. Everything is on one node at the moment.
What I think is happening: When an event arrives, all the idle threads in the saga manager’s thread pool wake up. They end up running AsyncSagaEventProcessor.doProcessEvent(), which calls prepareSagas(), which ensures there’s an active UnitOfWork, which begins a database transaction. One of the threads ends up handling the event, writing the saga to the database, committing its UnitOfWork, and releasing its connection back to the connection pool. So far so good.
The problem is that the remaining threads, each of which has already started its own UnitOfWork, have empty lists of processed sagas, so when they call persistProcessedSagas(), they end up skipping the code path that commits the active UnitOfWork. Those threads go back to waiting for more events but their UnitOfWork instances are still active and holding onto connections with open transactions.
This seems to get masked by the fact that when one of these threads processes another event, it will use the existing UnitOfWork rather than starting a new one thanks to the logic in ensureActiveUnitOfWork(). So the transactions do eventually get committed. That makes me think that maybe it’s by design to reduce event processing latency and so Axon isn’t constantly creating and destroying UnitOfWork instances in threads that end up not handling any events. However, PostgreSQL’s VACUUM command can’t run while there are open transactions, and with the behavior I’m seeing, there will essentially never be a time when there aren’t open transactions as long as the application is running, even when it’s sitting idle. It also means I have to increase the size of my database connection pool to account for idle saga threads holding onto connections.
Screenshot from IntelliJ’s debugger of the situation (waiting for an event with an active UnitOfWork): https://imgur.com/CGk8uLx
pg_stat_activity contents with an idle Axon application: http://pastebin.com/qdbZinas
Hopefully I’m just doing something stupid and this isn’t the expected behavior. Happy to post details of my configuration if that’s helpful. Obviously, take my analysis with a grain of salt since I’m not all that familiar with Axon’s internals, but the idle-in-transaction connections are for real.
-Steve