Summary
When Axon Server Enterprise 2025.2.4 starts with fresh PVCs (no prior data) and standalone-dcb=true, having the Axoniq Platform connection active causes the _admin Raft context to get stuck in an infinite pre-vote loop. The default DCB context is never created, and all client connections fail with AXONIQ-1302: default: not found in any replication group.
Environment
- Axon Server Enterprise: 2025.2.4
- Mode: Single-node,
standalone-dcb=true - Deployment: Kubernetes StatefulSet (GKE Autopilot)
- Platform: Axoniq Platform Community (license valid, 1 node)
Steps to Reproduce
- Delete all Axon Server PVCs (data, events, log)
- Start Axon Server with
AXONIQ_PLATFORM_AUTHENTICATIONenv var set (valid Platform token) - Observe
_admincontext stuck in pre-vote loop indefinitely
Expected Behavior
Axon Server should:
- Initialize the
_adminRaft context and elect itself leader (single-node, no peers) - Create the
defaultDCB context - Connect to Axoniq Platform
Actual Behavior
The _admin Raft context cycles between FollowerState and PreVoteState every ~1-2 seconds and never transitions to CandidateState or LeaderState:
_admin in term 0: Timeout in follower state: 1403 ms.
_admin in term 0: Updating state from FollowerState to PreVoteState
_admin: Starting pre-vote from axon-server-xxx in term 0
_admin in term 0: Updating state from PreVoteState to FollowerState (received pre-vote with term (1 >= 0))
_admin in term 0: Pre-vote granted for axon-server-xxx.
_admin in term 0: Request for pre-vote received ... voted true (handled as follower).
[repeats indefinitely]
The Platform connection succeeds and triggers initialization:
Received new license from Axoniq Platform: [active=true, plan=Axoniq Platform Community]
Initialization of this node with following contexts: [default]
_admin: init replication group
_admin in term 0: Starting the node...
But then the _admin context falls back into the pre-vote loop. The error logged is:
ERROR: Failed to apply Axoniq Platform license to the cluster: [AXONIQ-2100] No leader for _admin
The /v1/public/me endpoint shows adminNode: false, contextNames: [], storageContextNames: [].
Root Cause Analysis
The Platform’s AxoniqConsoleCoordinationService fires ReplicationGroupChangesHandler from a second thread before InitClusterTask completes the Raft leader election for _admin. This appears to cause a ConcurrentMembershipStateModificationException (observed in prior occurrences) that corrupts the Raft state machine, preventing leader election from completing.
The pre-vote succeeds (the node votes for itself) but never transitions to a full election (CandidateState), suggesting the Raft state machine is in an inconsistent state after the concurrent modification.
Workaround
Temporarily disable the Platform integration during fresh initialization:
- Remove/comment out the
AXONIQ_PLATFORM_AUTHENTICATIONenv var - Start Axon Server - it initializes cleanly within seconds:
_admin in term 1: Leader default: init replication group default in term 1: Leader Creating DCB context: default - Re-enable the Platform integration - Axon Server reconnects successfully
Impact
- Every fresh PVC initialization requires manual intervention (disable Platform, init, re-enable)
- This affects disaster recovery, environment provisioning, and CI/CD pipelines
- We have encountered this issue on 2 separate occasions (Jan 2026, Feb 2026)
Suggested Fix
The Platform integration should defer ReplicationGroupChangesHandler until after InitClusterTask has completed and the _admin Raft context has an elected leader. Alternatively, the initialization should be atomic and resilient to concurrent Platform callbacks.