Greetings,
We are running Axon 4 with Spring boot. The database is MySql 5.7, and we are using mysql-connector-java
to connect to it.
The ORM is hibernate, by way of Spring Data JPA. Hikari manages the database connection pool.
Deployment-wise, we have two instances in the cluster.
Configuration-wise, we are using the AsynchonousCommandBus with the SpringTransactionManager implementation.
We have been getting intermittent exceptions in production stating the socket connecting to the database is unexpectedly closed, things more or less like:
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 306788ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:676) at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:676)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:190)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:190)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:155)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:155)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:35)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:35)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:106) at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:106)
I was able to re-create the environment and the exception locally by load testing, though I still don’t know what is causing the exception. Interesting note, this only happens when there are at least two instances of the application running. In this scenario, each request in the load test is for a unique record (unique aggregate id).
While debugging, I noticed that the default Executor implementation for the AsynchronousCommandBus is
Executors.newCachedThreadPool(
new AxonThreadFactory(AsynchronousCommandBus.class.getSimpleName())
);
But similarly came across an article stating that the newCachedThreadPool can be considered harmful: http://dev.bizo.com/2014/06/cached-thread-pool-considered-harmlful.html
As a result, I changed the Executor implementation to
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors())
And this appears to have fixed the problem (at least re-running locally).
Does anyone have any insight as to what/where to look for the root cause, or why a fixed thread pool seems to fix the issue?
Thank you,
David