Axon server deployment in Aws ECS(FARGATE type)

Antonio_Portolan · December 10, 2024, 8:15am

Hello,
I have a problem with Axon server which is deployed on Aws ECS. Initial deployment went without any particular problems, but as soon as the task is redeployed if the address is changed the internal hostname also changes, and axon starts complaining about changed name and it has to be started with recovery file.

December 09, 2024 at 13:55 (UTC+1:00)
2024-12-09 12:55:13.863 ERROR 1 --- [ main] i.a.a.e.cluster.ClusterController : Current node name has changed, new name ip-172-31-23-207. Start AxonServer with recovery file.

December 09, 2024 at 13:55 (UTC+1:00)
2024-12-09 12:55:13.765 INFO 1 --- [ main] i.a.a.e.c.i.MessagingClusterServer : Axon Server Cluster Server started on port: 8224 - no SSL

December 09, 2024 at 13:55 (UTC+1:00)
2024-12-09 12:55:00.151 INFO 1 --- [ main] io.axoniq.axonserver.AxonServer : Axon Server version 2023.2.0

December 09, 2024 at 13:54 (UTC+1:00)
2024-12-09 12:54:54.556 WARN 1 --- [ main] o.f.core.internal.command.DbMigrate : outOfOrder mode is active. Migration of schema "PUBLIC" may not be reproducible.

December 09, 2024 at 13:54 (UTC+1:00)
2024-12-09 12:54:52.167 WARN 1 --- [ main] i.a.a.c.MessagingPlatformConfiguration : Ignoring domain part of the hostname 'ip-172-31-23-207.eu-central-1.compute.internal': hostname=ip-172-31-23-207, domain=eu-central-1.compute.internal

AWS documentation states that I cannot set fixed internal hostname if deployment type is FARGATE. Is there any other way to avoid this problem?

Corrado_Musumeci · December 10, 2024, 2:51pm

Hi,
due to the nature of ECS/K8s, information such as hostname or ip may vary each time you restart your instance.
Thus, while deploying in ECS or K8s it is better to set in your axonserver.properties values for the following properties
axoniq.axonserver.hostname
axoniq.axonserver.internal-hostname
axoniq.axonserver.domain
and assign a DNS entry value to each axonserver node.
This will guarantee that, on each restart, your node will be consistent and able to connect back to other nodes of the cluster.
Those values must stay consistent even if you are running a single node.
Information is stored in the configdb file and checked each time the node starts. This is why when you start your node, it gives you an ERROR asking you to provide a recovery file (aka a migration file).
You can follow the docs at this page Recovery to recover from your situation.