potential conflict between kafka partitioning and Axon segmentation

Alexander_Troyanovsk · August 29, 2018, 9:16am

Hi,

After trying for analyze how Kafka topic partitioning will work with Axon event processing segments, I’ve got an impression that having a Kafka topic with several partitions and at the same time more than one Axon segment can potentially lead to skipped, not processed events.
Consider a situation with a topic with two partitions and two configured Axon processor’s segments. Also let’s assume two JVMs trying to process events concurrently. In such scenario kafka consumer in each JVM will be assigned with one of the two partitions. Then the event processor in each JVM will try to lock a segment. Now it might happen that an event processor’s segment doesn’t match some or all events in its kafka consumer’s partition (i.e. those events are in the second partition), in which case we will have skipped event messages.
Is my understanding correct or I’m missing something?

Thanks,
Alex

Marinko_Babic · August 30, 2018, 6:51pm

According to the implementation

A single thread will always be used to read events from the stream. That thread will calculate the segmentID for each incoming event, based on their sequencing policy and a hash function. Two events with the same sequencing policy value will always be processed by the same thread.

The other threads will process their (portion of) the message queue, updating the token belonging to their segment.

So why do you think the events would be skipped? The are no two Trackingprocessor in one jvm processing the same topic.

Alexander_Troyanovsk · August 30, 2018, 10:40pm

The problem is when we have more than one segment, for example when registering a tracking event processor with TrackingEventProcessorConfiguration.forParallelProcessing(2) configuration. In this case two tracking event processors started in two different JVMs will claim different segments from the tokestore. At the same time if we have two Kafka partitions then each Kafka consumer belonging to the same consumer group will be assigned one of those partitions. So now we have one tracking event processor in one JVM reading events from one kafka partition, while the event processor in the second JVM reading from the other partition. So what happens when events in those partitions don’t match (Segment.matches()) segments of those two processors?

Thanks,
Alex

Marinko_Babic · August 31, 2018, 11:54am

You are saying that JVMs will claim different segments from the tokestore. Why this should be the case? The read happens doing the processorname in the WorkLauncher in the loop.

Alexander_Troyanovsk · August 31, 2018, 5:38pm

I’m talking about the Axon versions starting from 3.1 where an event processor can use multiple threads to process events from different segments. here’s the relevant section of the reference guide: https://docs.axonframework.org/part-iii-infrastructure-components/event-processing#parallel-processing

Marinko_Babic · August 31, 2018, 7:12pm

I know that you are talking about the actual version

https://github.com/AxonFramework/AxonFramework/blob/51a14807fb69fba5c03e669fd95f4b3fb8308181/core/src/main/java/org/axonframework/eventhandling/TrackingEventProcessor.java#L727

Alexander_Troyanovsk · August 31, 2018, 7:54pm

Ok so regarding “Why this should be the case?” Take a look at lines 760 and 767. Consider the scenario where an event processor in one JVM calls tokenStore.fetchToken() on the first segment and successfully claims it. The event processor in the second JVM will try to fetch and will get UnableToClaimTokenException, so it will try the second segment and will succeed. Thus we have two JVMs handling two segments concurrently. Perhaps I forgot to mention that both event processors have the same name.

Marinko_Babic · August 31, 2018, 8:23pm

I see now your point. The whole concept is based on processorname and segment. The partition is totally ignored. Because of the exception the segment will not be processed even if the two jvm are processing totally different partitions. Did I get the point correctly?

Alexander_Troyanovsk · September 1, 2018, 12:19pm

The problem stems from the fact that Axon segments mechanism is completely unaware and doesn’t take into account Kafka dynamic partition assignments inside a consumer group. Those two work distribution mechanisms (i.e. Axon segments and Kafka partitions) work independent of each other and completely unaware of each other. And this can potentially create a problem. So unless Axon takes full control of partition assignments in Kafka and makes segmentation mechanism aware of Kafka partitions, I think, although it’s bad performance wise, it’s currently much safer to have a single partition and single segment until those things are sorted out in future Axon versions.

Best Regards,
Alex

Marinko_Babic · September 3, 2018, 1:52pm

Hi Alex

Did you report this issue?

Many thanks
Marinko

Alexander_Troyanovsk · September 4, 2018, 6:23pm

Not yet. I wanted first to hear some opinions before I report it. Also I didn’t actually recreate the issue and this whole thing is solely based on a thought experiment

Thanks,
Alex

Alexander_Troyanovsk · September 4, 2018, 6:40pm

OK I’ve just created a new issue for this : https://github.com/AxonFramework/AxonFramework/issues/762

Marinko_Babic · September 4, 2018, 6:48pm

Great. Thanks