Hi there, I’m trying to run a clone of our shared environment’s Axon Server on my local machine so that I can test out new features without impacting other team members.
Our shared environment server is an enterprise edition 3-node cluster running in Kubernetes, but I only want to run a single server / node on my local machine. To that end, I’m using “docker-compose” to run the following 9 Docker containers:
- postgres
- Axon Server EE
- (our 7 client apps)
I’ve read and followed all of the instructions on the Backups and Recovery pages, namely:
- Created a backup of the control database, unzipped it, and copied its contents into the Docker container at:
- /axonserver/data/axonserver-controldb.mv.db
- queried the “events” and “snapshots” filenames, and copied these files into the Docker container at:
- /axonserver/events/our-app-context/00000000000000000000.events
- /axonserver/events/our-app-context/00000000000000000000.snapshots
- and queried the log file names, and copied the single file that it outputs (even though the URL name is plural, and I can see more *.log files inside the Docker container in our shared environment?) into the Docker container at:
- /axonserver/log/default/00000000000000000001.log
Next, I created a cluster-template.yaml
file containing the configuration that I wish to run on my local machine:
axoniq:
axonserver:
cluster-template:
first: ${LOCAL_AXONSERVER}
users:
- roles:
- context: _admin
roles:
- ADMIN
- context: our-app-context
roles:
- USE_CONTEXT
password: @dmin
userName: admin
replicationGroups:
- roles:
- role: PRIMARY
node: ${LOCAL_AXONSERVER}
name: _admin
contexts:
- name: _admin
metaData:
event.index-format: JUMP_SKIP_INDEX
snapshot.index-format: JUMP_SKIP_INDEX
- roles:
- role: PRIMARY
node: ${LOCAL_AXONSERVER}
name: default
contexts:
- name: our-app-context
metaData:
event.index-format: JUMP_SKIP_INDEX
snapshot.index-format: JUMP_SKIP_INDEX
applications:
- token: ${CLIENT_APP_TOKEN}
name: client-app-number-1
roles:
- roles:
- USE_CONTEXT
context: our-app-context
description: ""
// etc. for the remaining 6 apps
But as you can probably already see by now, I run into a dilemma:
Approach 1: include control DB:
If I copy “axonserver-controldb.mv.db” into the Axon Docker container, I get the following message in the console:
Current node name has changed, new name axonserver. Start AxonServer with recovery file.
But then if I include a recovery.json file:
[
{
"name": "axonserver1",
"oldName": "axonserver-0-0",
"hostName": "axonserver",
"internalHostName": "axonserver",
"internalGrpcPort": 8224,
"httpPort": 8024,
"grpcPort": 8124
}
]
…and add axoniq.axonserver.recoveryfile=/axonserver/config/recovery.json
into my “axonserver.properties” file, then at least it successfully renames my node, but then I get the following error:
Unknown host: axonserver-1-0.axonserver-svc.axonserver-ee.svc.cluster.local
…which appears like Axon Server is still looking for the other 2 nodes, which the control database is telling it should be there. But as I said at the start, I don’t want to run all 3 nodes of the cluster on my machine, just 1 of them. And so I assume that I don’t want to go down this path… (am I correct?)
Which leads me to:
Approach 2: only copy over events & snapshots:
With this approach, Axon Server starts up fine, and I can open the dashboard at http://localhost:8024/#query and see my events. But then once my first Spring Boot client application starts up, it begins sourcing the events, and I get a bunch of these errors:
2023-06-07 03:09:35.162 (Application trying to apply various events)
2023-06-07 03:10:05.373 Error occurred. Starting retry mode.
java.lang.IllegalStateException: The UnitOfWork is in an incompatible phase: NOT_STARTED
at org.axonframework.common.Assert.state(Assert.java:44)
at org.axonframework.messaging.unitofwork.AbstractUnitOfWork.rollback(AbstractUnitOfWork.java:123)
at org.axonframework.messaging.unitofwork.UnitOfWork.attachTransaction(UnitOfWork.java:276)
at org.axonframework.eventhandling.TrackingEventProcessor.processBatch(TrackingEventProcessor.java:459)
...
at java.base/java.lang.Thread.run(Thread.java:832)
2023-06-07 03:10:05.374 Releasing claim on token and preparing for retry in 1s
2023-06-07 03:10:05.529 Error:
io.axoniq.dataprotection.api.DataException: ADPM-5010. SQL Exception.
at io.axoniq.dataprotection.internal.y.G.d(uk:311)
at io.axoniq.dataprotection.cryptoengine.JdbcCryptoEngine.getKey(jka:103)
...
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30008ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696)
...
at io.axoniq.dataprotection.cryptoengine.JdbcCryptoEngine.getKey(jka:51)
... 58 common frames omitted
// more of the same
Am I going about this the right way? Is the problem in our app’s code, or how I’ve configured my server, or both, or something else?