Axon 3.2.2 Tracking Processors compatibility with large gaps

Kevin_v_I · July 3, 2018, 2:42pm

Hi all,
We are implementing Tracking Processors for a view projection. Our store currently has 10.000+ size gaps in the ids. After some hours of fiddling I got to know the behaviour of my setup a bit better, but still I am confused about configuring it.

GapAwareTrackingToken with default configuration (batch of 100, JdbcEventStorageEngine, Postgres). The current behaviour is that I see the token traversing over small gaps, <100. I expected a high maxGapOffset to help me a bit. However whenever the gap is >100 the process halts. Increasing the batch size fixes this, however I fear other negative side effects. Either way my biggest gap is 34000 at this moment, and I dont feel that implementing a batch size of 1.000.000 would be a good idea.

Basically it feels like the use of a batchSize in probing gaps in the event index might be too highly coupled with normal event store querying. Semantically the maxGapOffset property feels like the one and only property which should influence way gaps are tollerated and not the general querylimit (batch size). Or am I missing some point?

Best,
Kevin

Steven_van_Beelen · July 5, 2018, 6:40am

Hi Kevin,

Ideally, you wouldn’t have such large gaps between your events when using a RDBMS solution (read JDBC or JPA) of the EventStorageEngine.
The gaps are there because a relation database, if called concurrently, might create an entry with an given sequence id, but insert it after another process created an entry with a newer sequence id and inserted it first.
Gaps are thus typically accounting for small ranges of ‘periodically missing events’.

That’s also the reason why the gaps are cleaned after a certain point in time, ‘permanent gaps’ are not the initial goals of the gaps set.

My hunch why you have got gaps of 10.000+ in size, is because you do not have a dedicated sequence generator for the sequence id of your event store.
That might explain why the gaps are so large, as the sequence is also used for other entries in your database.

Is my hunch on this correct Kevin?

Cheers,
Steven

Kevin_v_I · July 6, 2018, 9:27am

Nice hunch. I was hoping that this would be correct. Unfortunately it is not. The events table is the only one using this sequence. I queried the relations tables of postgres to check this. I checked manually but also found a nice query:

`

select seq_ns.nspname as sequence_schema,
       seq.relname as sequence_name,
       tab_ns.nspname as table_schema,
       tab.relname as related_table
from pg_class seq
  join pg_namespace seq_ns on seq.relnamespace = seq_ns.oid
  JOIN pg_depend d ON d.objid = seq.oid AND d.deptype = 'a'
  JOIN pg_class tab ON d.objid = seq.oid AND d.refobjid = tab.oid
  JOIN pg_namespace tab_ns on tab.relnamespace = tab_ns.oid
where seq.relkind = 'S'
      and seq.relname = 'events_id_seq'
      and seq_ns.nspname = 'public';

`

This shows the event table as the only related item.

Regardless of how the gaps in the sequence have come to be, I feel that the ‘common’ batchSize should represent the amount of rows returned from the db and should not be a blocker in this case. While stepping trough the code I found there can be a faulty implementation. In JdbcEventStorageEngine there is:

`

String sql = "SELECT " + trackedEventFields() + " FROM " + schema.domainEventTable() +
        " WHERE (" + schema.globalIndexColumn() + " > ? AND " + schema.globalIndexColumn() + " <= ?) ";

`

Where globalIndex + batchSize is provided to fill the second query param.

The final clause (<= schema.globalIndexColumn) in this statement is limiting the way my gap can be breached. Looking at the fatchTrackedEvents in the JPA version of this implementation I don’t see this limitation in the query. Instead setMaxResults(batchSize) is used to adhere to the batchSize.

I tried it locally and now the gaps are crossed I have created a pr for this. It might need some more testing.

Steven_van_Beelen · July 6, 2018, 9:47am

Hi Kevin,

Great work on tracing that down the sequence generation.

It’s a usual culprit, hence why I pointed it out.

Additionally, much thanks for the provided PR.

I do feel that having an identical querying method between the JPA and JDBC EventStorageEngine makes sense.

I’ll ensure we’ll check your PR quickly, as it sounds like a valuable addition for 3.3.1.

Stay tuned on the PR you’ve added.

Cheers,

Steven