should we worry about numbers in event sourcing

ranajitjana · August 9, 2018, 5:12pm

First of all Please forgive my ignorance as I am new to Axon framework.
From architecture point of view I am just wondering that if we keep storing events for ever the storage may become too huge.

Example
lets say I get 1 million event a day then in a quarter of year i will be touching billion events which are persisted.

But how long this storage can go on ?

As we may not query these event always and depend on query side to read the data still the numbers worries me.

Folks who are using event sourcing for years. What is your strategy.

Frans_van_Buul · August 10, 2018, 7:19am

Hi!

This is an excellent question, and a smart one to ask up front, before you commit to event sourcing.

A standard relational database system will not work well with billions of events stored in a single table, it’s simply not designed for those kinds of numbers of records per table. If you really want to store those kinds of numbers of events in a relational database, you’d be looking at big hardware + a lot of tuning + designing custom logic to split your events across multiple tables in a way that is fitting for your situation.

For this reason, we’ve designed AxonDB. It’s a database system specifically suitable for event sourcing. It exploits the fact that events have a natural order, and that recent events are far more likely to be accessed than old events. This keeps it fast irrespective of the number of events already stored. To optimize storage efficiency and costs, it also supports things like multi-tiered storage (e.g. recent events on SSD and old events on mechanical drives), as well as compression of older events. This system can handle billions of new events per day.

Sometimes Mongo or Kafka are considered as alternatives for event store, but honestly they don’t really work out for true event sourcing. Mongo lacks a mechanism to create global sequence number, and Kafka lacks a mechanism to create a unique index on aggregateId + sequencenumber, which is essential for reliable event sourcing.

Kind regards,

Malay_Matwankar · August 10, 2018, 7:26am

Hi Frans,

Can we not achieve this by having a separate collection in Mongo for managing a global sequence number. Also is AxonDB a relational database?

Regards,
Malay

Frans_van_Buul · August 10, 2018, 7:57am

Hi Malay,

Regarding Mongo: Theoretically, you could use a single “counter document” to generate sequence numbers, but at high throughput this will become a bottleneck in your system. See https://www.mongodb.com/blog/post/generating-globally-unique-identifiers-for-use-with-mongodb for a much more detailed explanation of the topic.

Regarding AxonDB: It’s not a relational database - it has no “tables” or anything similar. It’s a system exclusively designed to store events and snapshots for event sourcing. It stores events in blocks called “segments”, in their natural order. It also maintains a secondary unique index on the combination aggregateId + sequencenumber. In a complete ES/CQRS system, you would use AxonDB for events while using other technology (RDBMS, Mongo, Elastic, Neo4J, …) for read models etc.

Kind regards,
Frans

Marinko_Babic · August 10, 2018, 9:50am

Normally every eventstore supports snapshots. Snapshot is the state of an aggregate after applying the events. In this case the old events can be removed. No idea if such concept is support by Axon.

Not every domain is suitable for event sourcing. Please think about this before trying to solve such issues technically.

Cheers
Marinko

ranajitjana · August 16, 2018, 4:53am

Yes, I would still like a mechanism to archive the old message after specified interval.

One of the main reason being the cost .

It can be substantial cost to hold all the messages when they grow beyond manageable limit and the maintenance headache (again cost) will derail all the benefits in long run.

Steven_van_Beelen · August 21, 2018, 12:04pm

Hi Marinko and Ranajitjana,

I would suggest against dropping events from your event store as soon as you’ve made a snapshot.

The snapshot is like a picture of the Aggregate state.

But what if you are changing your Aggregate state to a different form, making your snapshots faulty?

Then you’d have to drop your Snapshots and recreate them again, based on the events.

Thus, I’d very strongly suggest against dropping events which you are using to Event Source your aggregates.
Unless you have very thoroughly investigated your use case and dubbed it useless in your situation to store events of course.

Off loading events to a different storage is something which is part of AxonDB, albeit the Enterprise edition.

If you want this in the purely open source Axon Framework, you will have to implement your own EventStorageEngine with this logic in place.

Hope this sheds some extra light on the situation.

Cheers,

Steven