Axon Global Index with Oracle Sequence: Understanding Potential Issues for Tracking Event Processor

crajoli · September 24, 2024, 12:11am

Context

Our application operates in a multi-node environment where one of the node publishes events while another node processes these events asynchronously. Recently, we’ve observed that the async tracking processor is missing some events in production (with no corresponding data in the respective table), but we cannot replicate this issue locally. Our setup uses Axon with an Oracle database.

Analysis

After conducting some analysis, I suspect the issue may lie with the Oracle sequence used for the global index. By default, Oracle sequences are created with the CACHE option enabled. The CACHE enabled sequence does not guarantee the order of the sequence numbers.

Example : In a multi-node setup, one node may generate records with sequences like 21, 22, 23, and 24 at time-x, while another node might create records with 15, 16, and 17 at later time-y (x+1).

Tracking Event Processors

My understanding of Tracking event processor on how it fetch the new events

The processor queries the event store for new events that have occurred since the last processed token.
Tracking Token: Each event tracking processor maintains a tracking token, which indicates the last processed event. This token allows the processor to pick up where it left off in the event stream, ensuring no events are missed or processed multiple times.

Let’s take the example above here,

If the tracking event processor is running asynchronously on a different node, the tracking event processor could first process the events and update the tracking token to 24 at time x. Then, the Tracking event processor is trying to fetch the new events from 24, it could not expect the lower than 24 index, causing the 15, 16, and 17 events to be skipped.

So, the unordered global_index column complicates event processing, particularly if event processors rely on the sequence order (as with the Tracking Event Processor).

Actions Items.

Alter global_index sequence: Alter global_index sequence to use NOCACHE, ORDER. (This might impact performace at insertion time).
Implement Retry Handling: Implementing retry mechanisms to ensure that events are processed correctly. (It’s an early thought, so I’m not certain if there are any potential issues)

I’d like to know if my understanding is correct. Any additional insights would be greatly appreciated! Thank you

Steven_van_Beelen · September 24, 2024, 1:33pm

Introducing Gap Aware Tracking Tokens

Although your understanding is correct, @crajoli, you’re missing one implementation detail of using a RDBMS-based EventStore with Axon Framework combined with StreamingEventProcessor implementations like the TrackingEventProcessor.

The key difference is the implementation of the TrackingToken, which for the JpaTokenStore and JdbcTokenStore is a GapAwareTrackingToken. In the token_entry table, the token column would not only carry the position in the stream, but also the gaps that came up during querying of the event stream from the EventStore.
On subsequent retrieval of batches of events, the known gaps are used in the SQL sent towards the RDBMS-based Event Store as part of the WHERE clause. If you’re curious about one such query, here you can spot the query performed by the JpaEventStorageEngine:

//...
query = entityManager().createQuery(
                    "SELECT e.globalIndex, e.type, e.aggregateIdentifier, e.sequenceNumber, e.eventIdentifier, "
                            + "e.timeStamp, e.payloadType, e.payloadRevision, e.payload, e.metaData " +
                            "FROM " + domainEventEntryEntityName() + " e " +
                            "WHERE e.globalIndex > :token OR e.globalIndex IN :gaps ORDER BY e.globalIndex ASC",
                    Object[].class
//...

Why have Gap Aware Tracking Tokens?

Axon Framework uses a GapAwareTrackingToken for RDBMS-based Event Stores because it is pretty rough to provide generic sequence guarantees in such an environment. However, originally the GapAwareTrackingToken protects against the change that the commit and insert order don’t align, which would be problematic in the case the Transaction Isolation Level is set to READ_UNCOMMITTED.

Nowadays we see the gaps tend to be filled because the sequence generator differs per RDBMS, is reused by other tables within the database, or because of some other form of (mis)configuration.

How to get rid of Gaps?

The easiest way to not have gaps in your event stream, and thus to speed up and simplify this process entirely, is by using a dedicated Event Store like Axon Server. If you would be up for a quick and easy trial, know it is very easy to get an Axon Server setup through AxonIQ Console, which will give you a step-by-step installation guide.

If you prefer to set things up yourself, you can also simply run a single instance of Axon Server.

If those options are out of the question, then you will be required to think off the drawbacks of using an RDBMS as an ever increasing Event Store. Luckily, it seems you’re already on that path. Regardless, I do feel inclined to point out things would be greatly simplified if a dedicated Event Store would be used.