Replay query order by

Sebastian_Ganslandt1 · May 8, 2013, 8:49am

Hi,

Quick question, what is the reason for the ‘order by aggregateIdentifier’ clause in DefaultEventEntryStore.BatchingIterator.fetchBatch()? timestamp and sequencnumber seems like the only ones needed as far as I can see.

Cheers
Sebastian

Allard · May 9, 2013, 6:18am

Hi Sebastian,

the reason I added the aggregate identifier is that you get 100% deterministic ordering. It is possible to have two events with the same timestamp and sequencenumber. The ordering would the. Depend on the db’s mood of the day.

To make sure that each replay behaves the same, I added the aggregate identifier as a third order.

Cheers,

Allard

Sebastian_Ganslandt1 · May 9, 2013, 6:48am

I understand. One could argue that you should never ever depend on inter-aggregate order, even between replays. Have you investigated the impact of adding another column to the index on insert speed and index size?

Allard · May 9, 2013, 12:58pm

Hi Sebastian,

You’re right. There should be a balance between what’s guaranteed and how the event store performs. Currently, I have been focussing mostly on good behavior. Not all applications store 100mln events/month ;-).

Fortunately, the jpa event store is built for customization of the storage. It’s relatively easy to change the behavior, and focus more on performance and index size. I’ve got a few ideas n that area. Just to name a few: store an entry per commit, instead of per event and store timestamp as integer. The first will reduce the number of index entries. The latter reduces storage space, as well as index. It’s then also possible to consolidate entries by merging two entries of the same aggregate into one.

Cheers,

Allard

PS. Thanks for the pull requests. They look good. I will merge them soon!