Infinite loop in AsyncSagaEventProcessor

Hi all.

This piece of code in certain conditions enters a infinite loop, and to make things worse is in a private method with nothing configurable to avoid it.

while (!persistProcessedSagas(attempts == 0) && status.isRunning()) {
    if (attempts == 0) {
        logger.warn("Error committing Saga state to the repository. Starting retry procedure...");
    }
    attempts++;
    if (attempts > 1 && attempts < 5) {
        logger.info("Waiting 100ms for next attempt");
        Thread.sleep(100);
    } else if (attempts >= 5) {
        logger.info("Waiting 2000ms for next attempt");
        long timeToStop = System.currentTimeMillis() + 2000;
        while (inFuture(timeToStop) && isLastInBacklog(sequence) && status.isRunning()) {
            Thread.sleep(100);
        }
    }
}

The issue is when a sagaRepository.add(saga) or a sagaRepository.commit(saga) on the call to persistProcessedSagas throws a exception that is not a AxonNonTransientException.

I fixed my particular case by changing a SagaStorageException to a SerializationException but that may not be allways possible.

This is 2.4.6 by the way, I don’t know if this version is still supported nor if this situation happens in 3.x as well.

Any other suggestion besides of what I did (that is not always possible to do ) is welcomed.

Cheers.

Hi,

The reason it’s going into a loop, is because it was unable to persist a Saga. In such case, it would mean you have potentially produced side-effects, but the state necessary to know that these have been produced (i.e. the saga state) cannot be saved. Axon 2 then goes into retry mode to ensure a Saga is not left in an inconsistent state. Naively rolling back a transaction will not work in most cases a Saga is involved.

Axon 3 deals with this problem differently. It provides error handlers that allow you to define the behavior of failing events. However, inability to save Saga state is a problem, no matter how much computing power or smart algorithms you throw at it…

Cheers,

Allard

Yes, I understand inconsistent Saga states is a problem, but having a non-transient error being retried is completely useless and it simply kills the app sooner or later. The use of final classes and private methods to handle this situations with no way to configure it’s behavior is less than optimal…

In my case the repository is throwing a “duplicated primary key” error, and there’s little use in retry it, so if I have a minimum of controle about it I could decide what to do (update, change the PK, …) but like this my only solution is indeed to send the Saga to a inconsistent state.

Release 3 seems to go in the right way, yes.

Cheers.

Hi António,

sorry, I misread/misinterpreted non-transient in your previous mail. Indeed, it doesn’t make sense to retry on a non-transient error.
We are expecting to release a 2.4.7, so if you want to do a PR (into 2.4.x branch) to improve this, feel free to do so.

Cheers,

Allard