Concurrent Saga Access

I’m running an app with Axon 3.4.3, spring-boot 2.3.2, Java 11, and postgresql. Occasionally I am seeing a saga loaded after it has completed and been deleted from the DB. This only occurs when two events are processed at nearly the same time. The behavior is as follows:

Expected

  1. thread1: puts event on event bus
  2. thread2: puts event on the event bus
  3. thread1: Saga is loaded and calls SagaLifecycle.end()
  4. thread2 drops event because the saga has ended and was deleted from DB

Actual

  1. thread1: puts event on event bus
  2. thread2: puts event on the event bus
  3. thread1: Saga is loaded and calls SagaLifecycle.end()
  4. thread2: Saga is loaded and calls SagaLifecycle.end()

This has been problematic as the two events triggering the end of the saga issue different commands and thus result in different business logic.

Debugging the AnnotatedSagaRepository I can see the deleteSaga method called and then a subsequent doLoadSaga resulting in a response from the DB. If I query the DB directly the saga isn’t there at this point.

A couple other things worth noting

  • this app runs on a single node (for now)
  • the properties contain “spring.jpa.open-in-view=false”
  • handling of the events is wrapped using Springs @Transactional annotation
  • changing the @Transactional annotation isolation level to serializable results in “PSQLException: ERROR: could not serialize access due to concurrent update” on the second attempt to delete the saga entry

I would expect the transaction of the first load to commit before the second load begins but it appears there is indeed concurrent access to the same saga. Is this expected behavior? Perhaps I have something configured incorrectly? Any help or guidance would be appreciated.

Hi Ben,

Although I understand you are having a predicament here, I do like to point out that Axon 3.4.3 is pretty old already.
As it currently stands, Axon Framework 4.4.2 is out, which will in the near future (a week or so I think) be followed up by Axon Framework 4.4.3.

Having said that, I am also pretty confident that the issue you are describing is something we have noticed in the past.
However, my mind fails me when this occurred exactly, so I am unable to give you the release which should resolve the problem at hand.

It does lead me to the following request though: could you please upgrade to the latest framework release and check whether the problem still persists.
If this is the case, I think it would be valuable to open up a new message on the board of course.
I would like to ask you that if that’s the case, if you could drop it on https://discuss.axoniq.io/ (the new Axon forum) as this mailing list will be discontinued.

Trusting this will help you further Ben.

Cheers,
Steven

Hi Steven,

Thanks for following up. Whenever I get the time I’ll try updating to the latest version and see if that fixes things. I’ll be sure to update here and put something on the new discussion board.

Best,
Ben Walford