Axon Server EE: Platform integration causes Raft pre-vote loop on fresh PVC initialization

petrmac · February 23, 2026, 8:38pm

Summary

When Axon Server Enterprise 2025.2.4 starts with fresh PVCs (no prior data) and standalone-dcb=true, having the Axoniq Platform connection active causes the _admin Raft context to get stuck in an infinite pre-vote loop. The default DCB context is never created, and all client connections fail with AXONIQ-1302: default: not found in any replication group.

Environment

Axon Server Enterprise: 2025.2.4
Mode: Single-node, standalone-dcb=true
Deployment: Kubernetes StatefulSet (GKE Autopilot)
Platform: Axoniq Platform Community (license valid, 1 node)

Steps to Reproduce

Delete all Axon Server PVCs (data, events, log)
Start Axon Server with AXONIQ_PLATFORM_AUTHENTICATION env var set (valid Platform token)
Observe _admin context stuck in pre-vote loop indefinitely

Expected Behavior

Axon Server should:

Initialize the _admin Raft context and elect itself leader (single-node, no peers)
Create the default DCB context
Connect to Axoniq Platform

Actual Behavior

The _admin Raft context cycles between FollowerState and PreVoteState every ~1-2 seconds and never transitions to CandidateState or LeaderState:

_admin in term 0: Timeout in follower state: 1403 ms.
_admin in term 0: Updating state from FollowerState to PreVoteState
_admin: Starting pre-vote from axon-server-xxx in term 0
_admin in term 0: Updating state from PreVoteState to FollowerState (received pre-vote with term (1 >= 0))
_admin in term 0: Pre-vote granted for axon-server-xxx.
_admin in term 0: Request for pre-vote received ... voted true (handled as follower).
[repeats indefinitely]

The Platform connection succeeds and triggers initialization:

Received new license from Axoniq Platform: [active=true, plan=Axoniq Platform Community]
Initialization of this node with following contexts: [default]
_admin: init replication group
_admin in term 0: Starting the node...

But then the _admin context falls back into the pre-vote loop. The error logged is:

ERROR: Failed to apply Axoniq Platform license to the cluster: [AXONIQ-2100] No leader for _admin

The /v1/public/me endpoint shows adminNode: false, contextNames: [], storageContextNames: [].

Root Cause Analysis

The Platform’s AxoniqConsoleCoordinationService fires ReplicationGroupChangesHandler from a second thread before InitClusterTask completes the Raft leader election for _admin. This appears to cause a ConcurrentMembershipStateModificationException (observed in prior occurrences) that corrupts the Raft state machine, preventing leader election from completing.

The pre-vote succeeds (the node votes for itself) but never transitions to a full election (CandidateState), suggesting the Raft state machine is in an inconsistent state after the concurrent modification.

Workaround

Temporarily disable the Platform integration during fresh initialization:

Remove/comment out the AXONIQ_PLATFORM_AUTHENTICATION env var

Start Axon Server - it initializes cleanly within seconds:

_admin in term 1: Leader
default: init replication group
default in term 1: Leader
Creating DCB context: default

Re-enable the Platform integration - Axon Server reconnects successfully

Impact

Every fresh PVC initialization requires manual intervention (disable Platform, init, re-enable)
This affects disaster recovery, environment provisioning, and CI/CD pipelines
We have encountered this issue on 2 separate occasions (Jan 2026, Feb 2026)

Suggested Fix

The Platform integration should defer ReplicationGroupChangesHandler until after InitClusterTask has completed and the _admin Raft context has an elected leader. Alternatively, the initialization should be atomic and resilient to concurrent Platform callbacks.

stefand · February 24, 2026, 10:46am

Hi Petr,
Thanks for the detailed report! We’ve logged this as a bug internally and our team will look into it. In the meantime, the workaround you described should help.
Best!

Marc_Gathier · April 10, 2026, 8:44am

Hi Petr,

A small update on this issue. There is a simple workaround for this issue.
Instead of specifying the option “standalone-dcb”, you can also use “axoniq.axonserver.autocluster.dcb=true” to initialize the node with a DCB context.

Marc