Replay of events affects all services that also listen to this event

Michael_Dempfle1 · January 30, 2020, 1:07pm

Hi,

We have the problem that if we replay events for a new service (currently I already configure it that it only replays the last 2 hours) all other services/saga do get this events also.
For example I see in the logs that a cart created event is also delivered to the Saga and processed.

I would have expected that the tracking processor of the Saga will realize that is has already processed this event. What else is the token itself for?

We run axon server 4.2.4 and axon framework 4.1.2.

How can we fix this? For some events it does not matter but some others trigger again actions which are not wanted!

Best, Michael

Michael_Dempfle1 · January 30, 2020, 1:28pm

the strange this is it is now not reproduceable locally. I delete the entries in the db of this one processors. it gets recreated and the last X minutes get replayed. But we have this clearly in the logs also. What can cause this issue?

Best, Michael

Michael_Dempfle1 · January 30, 2020, 1:35pm

On prod we run 4.1.7 - locally 4.2.4 - can this be the difference?

Best, Michael

allardbz · January 30, 2020, 1:44pm

Hi Michael,

There is one token per processor per thread that the processor runs. Resetting a processor will not affect any other processor’s progress. Also, SagaManagers will ignore replayed events by default, so they won’t generate side-effects, again.
Are you sure you separated all the handers in different processors on production?

Michael_Dempfle1 · January 30, 2020, 2:02pm

Saga is in a different service. And the cart service was also affected which was not even loading the data. They all had extremely high load.

I will check the logs if something really happened or if the load was only from the discarding of Events.

Michael_Dempfle1 · January 30, 2020, 4:00pm

I did more investigations. There was NO duplicate executions - BUT we had a lot of load (up to 100%) Where we did not even receive any logs! So it seems to me the tracking processor was getting events and discarding them. Is this correct?

Best, Michael

allardbz · January 30, 2020, 8:26pm

In version 4.1, it is possible that processors read events and discard them if they are found to be irrelevant, or considered a replay. SagaManagers will not handle events that have been marked as “redelivery”, to prevent duplication of side-effects.

However, if they run in different services, the reset should not affect them. Are you sure the reset process didn’t reset all processors, instead of only one of them?

Kind regards,

Michael_Dempfle1 · January 30, 2020, 9:07pm

Hi Allard,

we did not reset anything. We added an Eventhandler in a new package in one of our services -> new tracking processor. Only the high load in 4.1.x was a real problem. Especially in the service which was running the saga. There during the replay basically nothing was working. 4.2.4 we already have tested up to our QA stage.
One question about the framework. Do we need to update the framework to 4.2.x also to improve the behavior there?

The replay of only the last x minutes works already local perfect. I will try this on our other stages tomorrow. So the load on the other services is hopefully already small.

Best, Michael

allardbz · February 4, 2020, 1:36pm

I’m very curious about what triggered the Saga’s EventProcessor to start a replay. Having another processor start from the beginning is in no way a trigger for a processor to reset itself. How are event handlers (incl. Sagas) organized into Event Processors and How did you trigger the reset?

While we would always recommend upgrading to the latest version, I doubt it will address the real underlying problem.

Kind regards,

Michael_Dempfle1 · February 4, 2020, 2:14pm

Hi Allard,

The new service needed e.g. the event A. The event A was not processed in the Saga because we could see this in the logs otherwise.
But we had high load. I thought the blacklisting which is new on 4.2.x solves that all events go everywhere.

On our QA env we unfortunately only had ~1% of the events. And when upgrading to 4.2.4 I actually run a lot of load tests to now get to 10%. And there the load on the saga looked much better. So I think that the blacklisting is the cause of it. Do you agree?

Best, Michael

allardbz · February 4, 2020, 2:32pm

Ah, I now get what you mean. Yes, upgrading to AxonServer 4.2 and AxonFramework 4.2 will definitely help. The blacklisting will help ensure that events of a type for which there is no handler get ignored. That severely reduces I/O between these nodes as well as processing time on the receiving end.

Cheers,

Michael_Dempfle1 · February 5, 2020, 12:52am

I will give it a try and let you know.

Best, Michael

Michael_Dempfle1 · February 21, 2020, 8:19pm

Hi Allard,

The replay stuff works nicely for normal processors.

public void configureEventHandlerProcessor(Configurer configurer) {
    Duration duration = Duration.ofMinutes(replayTimeInMinutesEventHandler);
    configurer.eventProcessing().registerTrackingEventProcessor("EventHandlerProcessor",
        org.axonframework.config.Configuration::eventStore,
        configuration -> TrackingEventProcessorConfiguration
            .forParallelProcessing(initialThreadCountEventHandler)
            .andInitialSegmentsCount(initialSegmentCountEventHandler)
            .andBatchSize(batchSizeEventHandler)
            .andInitialTrackingToken(s -> s.createTokenSince(duration))
    );
}

But for Sagas this fails with an
A component is requesting an Event Store, however, there is none configured.

Is there a different way to limit the replay of Sagas?

Best, Michael

allardbz · March 1, 2020, 3:53pm

That’s a weird message. It means that your configuration doesn’t have an Event Store. Did you perhaps create an additional Configurer instance, instead of using Axon’s supplied instance?

Michael_Dempfle1 · March 2, 2020, 10:57am

Hi Allard,

We only use the Configurer at a different place to configure the message monitor.

The way I posted works for for all our other services except the Saga.
We have the Saga running in an independent service. Maybe this is an issue that something is not prepared like for the others.

I removed the axon.eventhandling.processors."[CheckoutSagaProcessor]" settings from our application.yaml. Because having this there in parallel also causes issues.

So you think disabling the replay or limit it should be done the same way as for the others?

Michael_Dempfle1 · March 6, 2020, 8:01am

Hi Allard,

any idea here? At
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/axonframework/7fff4_6EUQQ/SMv_qYQTAgAJ

is the same recommendation.

Best, Michael