Clarifying event segments

fjx · June 17, 2022, 6:10pm

I’d like to make sure I’m understanding event segments correctly.

Based on the documentation here, my understanding is that event segments are a mechanism whereby each TEP (TrackingEventProcessor) instance distributes events to the threads controlled by that instance. This ensures that two different threads on the same TEP instance don’t both try to process the same event.

Two questions:

Have I understood what event segments do correctly?
If I have, great. However, while this says what it does, the documentation does not actually explain what an event segment is. What exactly is an event segment?

Steven_van_Beelen · June 20, 2022, 1:56pm

Hi @fjx, you’ve hit a piece of the Reference Guide that could have used some love when we updated the Event Processors page. Sadly, we (read: I) missed updating this page as such not providing the valuable reference that could help you to better understand what this is.

Now, let me go to the questions at hand to clarify things:

You’re right. The segments ensure that two or more threads do not cover the same section (segment) of events from the event stream.

A Segment refers to a portion of the Event Stream. What this portion is, is deduced by the SequencingPolicy. Furthermore, such a Segment is stored in a TrackingToken, for which you can find the documentation here.

They thus provide a means to break up the Event Stream into several sections, allowing for parallel event processing. The Reference Guide has this to say on the subject.

I hope this clarifies things for you, @fjx.
If not, feel free to keep commenting under this thread!

dmurat · January 22, 2023, 7:39pm

What segments are and how they relate to event processors was very confusing for me too. Luckily, a great article from Milan Savić explains this in detail.

There is only one confusing bit in the article related to the second picture (assigning events to segments). The confusing part relates to masking values in the picture. Masking values correspond to the system that initially had segments 0 and 1, and then segment 1 was split into segments 1 and 2. If we initially (empty token store) created the system with three segments, the masking values would be 1 for segment 0, 1 for segment 1, and 11 for segment 2, I think (please correct me if I’m wrong).

Anyway, I suggest including Steven’s short explanation and the link to the article in the reference guide. Probably somewhere in Event Processors page. I believe this could help many people.

Steven_van_Beelen · January 23, 2023, 12:25pm

Thanks for your two-cents here, @dmurat.
There’s work underway to adjust the documentation in its entirety, which will have space for the recommendation you’ve just made.

vab2048 · May 3, 2023, 9:32pm

The linked article is good on the what but another consideration, and a question I would like to ask in this thread is: when exactly should you use segments?

What is a use case in which you guys would recommend splitting the stream into segments? Is there a GitHub repo which shows it in action?

Gerard · May 4, 2023, 6:09am

The mean use case is to process events faster. You can also use batching for this. By using segments multiple threads, or even multiple instances, can process events from the same stream. This provides the scalability which you might need to keep up with the even stream.