Long running tasks in TEP EventHandlers

Hello Experts!

We have a use case where one of our TEP, in response to single event, depending on the number of clients, might have signifficantly more work to do than other TEPs in the system.

Due to too short claim timeout, before the TEP completed processing, some other node has stolen an ownership over a segment, causing the processor to fail when commiting the processing. Very soon we realized that all TEP were doing exaclty the same job, and none of them was able to update a tracking token at the end of processing. Default 10 seconds of token claim timeout wasn’t sufficient and we extended it to 30 seconds for now, knowing that this won’t be enough shortly.

The claim timeout is set on a TokenStore, globally for all TEPs in the service. Drawback is that even for “the fastest” TEPs, in case of failure, another node will hand over the job after exiration of the timout valid for “the slowest” TEP.

We are looking for a way to extend token claim during processing of an event, so the claimTimeout can be kept relatively short and we can ensure the ownership of the segment won’t change until the processor completes the processing.

Does it sound like a valid approach? How can we achieve this in Axon? Maybe you have some other recommndations on how to deal with this kind of use cases.