Hello, a little bit of context, I was using Axonserver v4.4.8.1 and we planned to update to version 4.6.8 a few days ago.
After the upgrade, I started to see the following logs:
- This installation is running on EKS
- We weren’t using snapshots in version 4.4.8.1
- After the upgrade process I had to restart the Axonserver nodes (and we already had applications connected to it), perhaps this caused the issue.
logs from the node leader
2023-05-09 18:42:35.279 INFO 1 --- [uest-executor-1] i.a.a.cluster.grpc.GrpcRaftPeer : Error on InstallSnapshot stream: CANCELLED: Failed to read message.
2023-05-09 18:42:35.280 INFO 1 --- [uest-executor-1] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: Install snapshot message failed. Reason: CANCELLED: Failed to read message..
2023-05-09 18:42:35.280 INFO 1 --- [uest-executor-1] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: Changing state from Install Snapshot Replicator Peer State to Append Entries Replicator Peer State.
2023-05-09 18:42:35.281 INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: Sending append entry to axonserver-2-0-0206a7a3-cb10-458b-9c9a-85ba5e9cfd7d failed. Reason: default in term 249: previous term/index missing 249/3533 last log 3533. Last applied index: 0, match index: 0.
2023-05-09 18:42:35.281 INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: follower axonserver-2-0-0206a7a3-cb10-458b-9c9a-85ba5e9cfd7d is far behind the log entry. Follower's next index: 1.
2023-05-09 18:42:35.281 INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: Changing state from Append Entries Replicator Peer State to Install Snapshot Replicator Peer State.
2023-05-09 18:42:35.281 INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: start snapshot installation
2023-05-09 18:42:35.303 INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer : default in term 249: Install snapshot config done received: responseHeader {
requestId: "bd148c23-93a0-4dfc-ac49-22069346da60"
responseId: "3dee79d6-8f1e-4f08-beb9-4640ca9e24ee"
nodeId: "axonserver-2-0-0206a7a3-cb10-458b-9c9a-85ba5e9abefabd"
}
groupId: "default"
term: 249
requestIncrementalData {
lastReceivedOffset: 1
lastEventTokenPerContext {
key: "prod"
value: 725039
}
lastSnapshotTokenPerContext {
key: "prod"
value: -1
}
}
, matchIndex 0, nextIndex 1, lastEvents: {prod=725039}
and logs from the node that is having issues
2023-05-09 18:44:54.163 INFO 1 --- [ster-executor-4] i.a.axonserver.cluster.FollowerState : default in term 249: previous term/index missing 249/3533 last log 3533
2023-05-09 18:44:54.169 INFO 1 --- [ster-executor-4] i.a.a.e.logging.ClusterEventsLogger : default in term 249: Leader is axonserver-0-0
2023-05-09 18:44:54.170 INFO 1 --- [ster-executor-4] i.a.axonserver.cluster.FollowerState : default in term 249: Install snapshot started.
2023-05-09 18:44:54.170 INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore : default: Clearing log entries, setting last index to 3533
2023-05-09 18:44:54.170 INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore : default: Removing 3534 segment
2023-05-09 18:44:54.171 INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore : default: Initializing log
2023-05-09 18:44:54.171 INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore : default: File for segment 3534 does not exist. Creating new file with size of 16777216.
2023-05-09 18:44:54.202 INFO 1 --- [ster-executor-1] i.a.axonserver.cluster.FollowerState : default in term 249: previous term/index missing 249/3533 last log 3533
2023-05-09 18:44:54.207 INFO 1 --- [ster-executor-1] i.a.a.e.logging.ClusterEventsLogger : default in term 249: Leader is axonserver-0-0
2023-05-09 18:44:54.207 INFO 1 --- [ster-executor-1] i.a.axonserver.cluster.FollowerState : default in term 249: Install snapshot started.
2023-05-09 18:44:54.207 INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore : default: Clearing log entries, setting last index to 3533
2023-05-09 18:44:54.208 INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore : default: Removing 3534 segment
2023-05-09 18:44:54.217 INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore : default: Initializing log
2023-05-09 18:44:54.217 INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore : default: File for segment 3534 does not exist. Creating new file with size of 16777216.
Is there anything I can do to fix this issue? It looks like the node isn’t able to restore itself. What can I do to debug more?