Error on InstallSnapshot stream: CANCELLED

Hello, a little bit of context, I was using Axonserver v4.4.8.1 and we planned to update to version 4.6.8 a few days ago.

After the upgrade, I started to see the following logs:

  • This installation is running on EKS
  • We weren’t using snapshots in version 4.4.8.1
  • After the upgrade process I had to restart the Axonserver nodes (and we already had applications connected to it), perhaps this caused the issue.

logs from the node leader

2023-05-09 18:42:35.279  INFO 1 --- [uest-executor-1] i.a.a.cluster.grpc.GrpcRaftPeer          : Error on InstallSnapshot stream: CANCELLED: Failed to read message.
2023-05-09 18:42:35.280  INFO 1 --- [uest-executor-1] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: Install snapshot message failed. Reason: CANCELLED: Failed to read message..
2023-05-09 18:42:35.280  INFO 1 --- [uest-executor-1] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: Changing state from Install Snapshot Replicator Peer State to Append Entries Replicator Peer State.
2023-05-09 18:42:35.281  INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: Sending append entry to axonserver-2-0-0206a7a3-cb10-458b-9c9a-85ba5e9cfd7d failed. Reason: default in term 249: previous term/index missing 249/3533 last log 3533. Last applied index: 0, match index: 0.
2023-05-09 18:42:35.281  INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: follower axonserver-2-0-0206a7a3-cb10-458b-9c9a-85ba5e9cfd7d is far behind the log entry. Follower's next index: 1.
2023-05-09 18:42:35.281  INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: Changing state from Append Entries Replicator Peer State to Install Snapshot Replicator Peer State.
2023-05-09 18:42:35.281  INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: start snapshot installation
2023-05-09 18:42:35.303  INFO 1 --- [uest-executor-2] i.a.axonserver.cluster.ReplicatorPeer    : default in term 249: Install snapshot config done received: responseHeader {
  requestId: "bd148c23-93a0-4dfc-ac49-22069346da60"
  responseId: "3dee79d6-8f1e-4f08-beb9-4640ca9e24ee"
  nodeId: "axonserver-2-0-0206a7a3-cb10-458b-9c9a-85ba5e9abefabd"
}
groupId: "default"
term: 249
requestIncrementalData {
  lastReceivedOffset: 1
  lastEventTokenPerContext {
    key: "prod"
    value: 725039
  }
  lastSnapshotTokenPerContext {
    key: "prod"
    value: -1
  }
}
, matchIndex 0, nextIndex 1, lastEvents: {prod=725039}

and logs from the node that is having issues

2023-05-09 18:44:54.163  INFO 1 --- [ster-executor-4] i.a.axonserver.cluster.FollowerState     : default in term 249: previous term/index missing 249/3533 last log 3533
2023-05-09 18:44:54.169  INFO 1 --- [ster-executor-4] i.a.a.e.logging.ClusterEventsLogger      : default in term 249: Leader is axonserver-0-0
2023-05-09 18:44:54.170  INFO 1 --- [ster-executor-4] i.a.axonserver.cluster.FollowerState     : default in term 249: Install snapshot started.
2023-05-09 18:44:54.170  INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore      : default: Clearing log entries, setting last index to 3533
2023-05-09 18:44:54.170  INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore      : default: Removing 3534 segment
2023-05-09 18:44:54.171  INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore      : default: Initializing log
2023-05-09 18:44:54.171  INFO 1 --- [ster-executor-4] i.a.a.c.r.file.PrimaryLogEntryStore      : default: File for segment 3534 does not exist. Creating new file with size of 16777216.
2023-05-09 18:44:54.202  INFO 1 --- [ster-executor-1] i.a.axonserver.cluster.FollowerState     : default in term 249: previous term/index missing 249/3533 last log 3533
2023-05-09 18:44:54.207  INFO 1 --- [ster-executor-1] i.a.a.e.logging.ClusterEventsLogger      : default in term 249: Leader is axonserver-0-0
2023-05-09 18:44:54.207  INFO 1 --- [ster-executor-1] i.a.axonserver.cluster.FollowerState     : default in term 249: Install snapshot started.
2023-05-09 18:44:54.207  INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore      : default: Clearing log entries, setting last index to 3533
2023-05-09 18:44:54.208  INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore      : default: Removing 3534 segment
2023-05-09 18:44:54.217  INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore      : default: Initializing log
2023-05-09 18:44:54.217  INFO 1 --- [ster-executor-1] i.a.a.c.r.file.PrimaryLogEntryStore      : default: File for segment 3534 does not exist. Creating new file with size of 16777216.

Is there anything I can do to fix this issue? It looks like the node isn’t able to restore itself. What can I do to debug more?