Saga Tracking Event Processor doesn't retry event processing when exception occures in saga after upgrade to Axon 3.3.2

Aleksey_Podogov1 · July 24, 2018, 11:11am

Hi everybody!

Subj. While non-saga tracking event processors working as before, releasing claim and retrying with backoff schema.
Am I missing some of Saga configuration? Currently they are configured in such way:

`

@Bean
public SagaConfiguration mySagaConfiguration() {
    return SagaConfiguration.trackingSagaManager(MySaga.class);
}

Введите код…
`

Many thanks for help,
Aleksey.

Aleksey_Podogov1 · July 26, 2018, 10:35am

Hi,

I’ve created sample application for reproduce this issue: https://github.com/aupodogov/axon-tracking-saga-test

Cheers,
Aleksey.

allardbz · July 26, 2018, 12:22pm

Hi Aleksey,

thanks for the application. That makes investigating the issue a lot simpler. Although, the tests pass fine when I run them. I have switched between different versions, but it keeps passing. In the logs, I see the expected 2 retries and the passing 3rd attempt.

Can you share more details on how it fails?
Do you have a log, perhaps. To be sure, which Java version and OS do you use?

Cheers,

Allard

Aleksey_Podogov1 · July 26, 2018, 12:58pm

Hi, Allard!

Thanks for reply! I run this test on Windows machine with Oracle JVM:

`
Java version: 1.8.0_161, vendor: Oracle Corporation
Java home: D:\java\jdk1.8.0_161\jre
Default locale: ru_RU, platform encoding: Cp1251
OS name: “windows 7”, version: “6.1”, arch: “amd64”, family: “windows”

`

Logs are here: https://drive.google.com/folderview?id=1nLNJi0oGiByjDaNHtZBnYkCcvivCOCWL

Aleksey

allardbz · July 27, 2018, 6:49am

Hi Aleksey,

looking at the logs, it seems that the processor is actually doing a replay. I can see the statements:

15:41:03.778 INFO a.e.TrackingEventProcessor#ensureEventStreamOpened: Fetched token: null for segment: Segment[0/0]
15:41:03.841 WARN o.a.e.TrackingEventProcessor#processingLoop : Releasing claim on token and preparing for retry in 1s

15:41:04.853 INFO a.e.TrackingEventProcessor#ensureEventStreamOpened: Fetched token: null for segment: Segment[0/0]

…
15:41:04.866 WARN o.a.e.TrackingEventProcessor#processingLoop : Releasing claim on token and preparing for retry in 2s
15:41:06.876 INFO a.e.TrackingEventProcessor#ensureEventStreamOpened: Fetched token: null for segment: Segment[0/0]

The last log statement is the occurrence of a third failure, after which the test stops.
Another interesting aspect is that the test fails because it only found one invocation. In the logs above, you can see it was already invoked, twice at least.

What’s the behavior you observe in your real application that lead you to believe the processor stopped processing?

Cheers,
Allard

Aleksey_Podogov1 · July 27, 2018, 7:13am

Hi Allard,

There are two tests: one for saga that fails and one for event listener that passes. I marked the second test with @Ignore and added the new logs. Could you please review them.

Kind regards,
Aleksey

Steven_van_Beelen · July 27, 2018, 1:35pm

Hi Aleksey,

Me and Allard eventually did get the test to fail and we also know why it fails.
I’ve created this PR to address the issue, which will be part of release 3.3.3.

I also ran your shared test with a local snapshot release of the framework containing that fix, which turned the tests green.

So, stay tuned for 3.3.3!

Cheers,
Steven

Aleksey_Podogov1 · July 27, 2018, 4:03pm

Hi Steven,

Thanks for fix for that issue to you and Allard!
Will wait for the release.

Kind regards,
Aleksey