JpaTokenStore and Kubernets PODs with Multiple Containers

Vladyslav_Baidak · September 1, 2018, 12:12pm

Hi all,

We have a Kubernetes setup on production environment, which we don’t have access to (we can only view logs).

Basically, we know that Kubernetes POD has 2 identical containers and our application configured to use Axon Tracking Event Processor and JpaTokenStore, thus it has 2 entries in TOKEN_ENTRY table.

After around a month of excellent work, we started to receive the following errors: “Unable to extend the claim on token for processor … It is either claimed by another process, or there is no such token.”.
This happens only when user executes a command, which, in turn, fires an event. In fact, everything works as expected and nothing is broken - but we see the specified messages in log.

Investigation showed that this might be caused by owner mismatch (since owner is dynamically generated using JVM name and it could be changed), however we are not 100% sure.
Maybe someone faced with such issue and knows the right solution.

Any help would be appreciated. Thanks!

Gerlo_Hesselink · September 3, 2018, 5:15pm

“thus it has 2 entries” … that is not what i expected. How is your tracking processor set up ? If it has only 1 segment it should have 1 entry in the TOKEN_ENTRY table, and the container processing the events for the processor is visible in that entry (that’s what we see using a docker service).
Are both entries up to date with the token index and the update timestamp ? What is (are) the name(s) of the processor you see in the table ?
Gerlo.

Vladyslav_Baidak · September 3, 2018, 5:34pm

Hi Gerlo,

After some time of digging today, I finally understood how it works and the issue we have. Unfortunately, I missed some points when describing our case. In general you’re right - we have only one entry per processor name && segment (we use single threaded tracking processor).

The issue we have happens due to inability to pick up the token by the lead node in case if it was already picked up. We couldn’t figure out how this happened but it seems to me that owner was manually changed (in fact, I was able to reproduce the issue only when setting owner manually to ‘null’).

Simple redeploy, which, in turn, restarts processors, fixes it.

Anyways, I’ll try to investigate the issue more and upgrade Axon to the latest version