Not so graceful shutdown after a scheduled GKE update with kubectl drain

maartenjanvangool · November 26, 2021, 1:42pm

So as mentioned here, we’ve had some issues with a not so graceful shutdown after a cluster upgrade within a GKE maintenance period.

So let’s first talk about what happened (as far as I can tell). We run Axon Server SE in a Google Kubernetes engine. These clusters have maintenance windows in the weekend, during which it will do node pool upgrades. From what I can tell from the documentation, it will do a kubectl drain of a node. The drain command will wait 30 seconds by default until a pod is shut down, before forcing the issue.

My thinking is that the issue we got is caused by a shut down taking longer than 30 seconds. If this is the case, the solution would be to increase the default terminationGracePeriod to something longer than 30 seconds.

So what I’m asking is, what time would it normally take for axon server to shut down gracefully? Is there a recommended shutdown period?

allardbz · November 26, 2021, 2:17pm

Hi,

could you share the logs of the final moments of Axon Server? That should indicate if it had started the shutdown process and maybe give some insight in what it was waiting for…

maartenjanvangool · November 26, 2021, 3:14pm

2021-11-06 23:14:59.674 INFO 1 --- [grpc-executor-1] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxx, clientId = 1@orangebeard-xxxxxx-7c44bbcc55-pwbrq.d1e3dd6a-5644-4b1e-99c9-8d1c0e234bff, context = default
2021-11-06 23:15:51.190 INFO 1 --- [grpc-executor-3] i.a.a.logging.TopologyEventsLogger : Application connected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxx-7c44bbcc55-khgx5, clientStreamId = 1@orangebeard-xxxxxxxxxxxx-7c44bbcc55-khgx5.1b9d4f22-f8e3-4223-8be4-685b24a20a57, context = default
2021-11-06 23:18:12.133 INFO 1 --- [grpc-executor-1] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxx, clientId = 1@orangebeard-xxxxx-6b4c6b8697-rr7j4.30911d6f-2a78-483b-895e-0af4c8268478, context = default
2021-11-06 23:18:12.490 INFO 1 --- [grpc-executor-3] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxx, clientId = 1@orangebeard-xxxxxxxx-5d5cb9d679-9kxqt.e58fa752-73cd-4d3a-a701-873f6ec01ed4, context = default
2021-11-06 23:18:32.060 INFO 1 --- [grpc-executor-2] i.a.a.logging.TopologyEventsLogger : Application connected: orangebeard-xxxxx, clientId = 1@orangebeard-xxxxxx-6b4c6b8697-g5fsk, clientStreamId = 1@orangebeard-xxxxxxxxxxxx-6b4c6b8697-g5fsk.f71f931f-004a-4e8b-8180-192aad1de4bc, context = default
2021-11-06 23:18:44.202 INFO 1 --- [grpc-executor-2] i.a.a.logging.TopologyEventsLogger : Application connected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-5d5cb9d679-nh7pt, clientStreamId = 1@orangebeard-xxxxxxxxxxxx-5d5cb9d679-nh7pt.05242447-195a-46fc-8a33-569a98156502, context = default
2021-11-06 23:24:21.939 INFO 1 --- [grpc-executor-4] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-5474468475-269tt.20cfa999-6775-4a00-97e4-3ea552e619e8, context = default
2021-11-06 23:24:22.066 INFO 1 --- [grpc-executor-4] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-7fb964cbb5-6k5qk.c56670cc-4963-45ea-8e8f-ace03db2f749, context = default
2021-11-06 23:24:22.529 INFO 1 --- [grpc-executor-1] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-6b4c6b8697-g5fsk.f71f931f-004a-4e8b-8180-192aad1de4bc, context = default
2021-11-06 23:25:08.976 INFO 1 --- [grpc-executor-1] i.a.a.logging.TopologyEventsLogger : Application connected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-7fb964cbb5-pmz7v, clientStreamId = 1@orangebeard-xxxxxxxxxxxx-7fb964cbb5-pmz7v.d67f9c6f-5592-4759-8fdc-e6f022a7ddd0, context = default
2021-11-06 23:26:01.893 INFO 1 --- [grpc-executor-2] i.a.a.logging.TopologyEventsLogger : Application connected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-5474468475-hhl8l, clientStreamId = 1@orangebeard-xxxxxxxxxxxx-5474468475-hhl8l.c98fe644-152f-4939-a948-8761624fb05c, context = default
2021-11-06 23:26:12.845 INFO 1 --- [grpc-executor-4] i.a.a.logging.TopologyEventsLogger : Application connected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-6b4c6b8697-cpvmz, clientStreamId = 1@orangebeard-xxxxxxxxxxxx-6b4c6b8697-cpvmz.adf2228a-6e8e-4825-aa62-a8e37721e95f, context = default
2021-11-06 23:27:10.263 INFO 1 --- [grpc-executor-2] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-5474468475-8dzs4.0c10a8c2-a5dd-491b-bba4-35541d6b6729, context = default
2021-11-06 23:27:10.928 INFO 1 --- [grpc-executor-1] i.a.a.logging.TopologyEventsLogger : Application disconnected: orangebeard-xxxxxxxxxxxx, clientId = 1@orangebeard-xxxxxxxxxxxx-7c44bbcc55-wz9tk.e5d02d7b-cb0c-471c-a19f-31bf00652392, context = default
2021-11-06 23:27:11.250 WARN 1 --- [ault-trackers-1] i.a.a.message.event.EventDispatcher : listEvents: Error on connection from event store: [AXONIQ-9000] Failed to read event: 904611

There is no specific logging on shut down; what I see is apps disconnecting and connecting, of which at least a part can be explained by these deployments ‘node hopping’ as well. I masked the names, but it are several different micro services disconnecting and connecting.

maartenjanvangool · January 19, 2022, 2:22pm

As discussed with @Marc_Gathier, Axon Server 4.5.2 contains some improvements with regard to graceful shutdowns. We’re going to upgrade, and we hope to no longer have issues in this regard.