Replaying events with JpaEventStore and Mysql

Stein_Welberg1 · June 24, 2013, 9:39am

Hi Allard,

We run into a problem regarding the performance of replaying events. Currently we have an event store with > 1.000.000 events. Our db is Mysql. We have created a simple replaying service which replays the events and uses the visitEvents method from the JpaEventStore to select, deserialize and upcast the events.

The problem lies in the fact that the query to select events in batches is very slow. The query probably uses limit and offset (which is known to be slow with a large number of records.)
I’m wondering whether there is a way we can influence the actual query that is performed on the database.

The query below (using late row lookups) is a lot quicker (~4 seconds compared to 80+ seconds:
select a.* from DomainEventEntry a
inner join (select eventIdentifier from DomainEventEntry limit 100 offset 100000) b
on a.eventIdentifier = b.eventIdentifier where payloadType = ‘x’

Allard · June 24, 2013, 12:51pm

Hi Stein,

this is a known issue in Axon 1. Axon 2 has a different approach, and doesn’t have this problem at all. Axon 2 is equally fast in the first million cases or in the last million in a 100mln table of items (disregarding page faults, of course).
Are you in fact using Axon 1?

Kind regards,

Allard Buijze

Stein_Welberg1 · June 25, 2013, 7:04am

Hi Allard,

We’re using Axon 2.02. To filter events I also added a criterium to the query (payloadType == ‘x’). Maybe this slows it down…

Regards,
Stein

Allard · June 25, 2013, 7:19am

Adding criteria to the replay may decrease speed. MySQL might not be able to use the indexes you have created anymore. You can use “DESCRIBE SELECT …” to see which indices mysql would use. If none, you can decide to either add an index, or not filter at all. If you effectively need a filter on your handlers, you could filter them inside your handler. Axon 2 optimizes deserrialization, meaning it’s only done when you access the payload of a message.

Initially, I used late row queries, but they don’t seem to scale very well. When the indices don’t fit in memory anymore, MySQL resorts to a disk-sort. Obviously, this depends on the amount of memory in the machine.

Cheers,

Allard

Stein_Welberg1 · June 25, 2013, 12:56pm

Allard,

I’ve been playing around with some indexes. We didn’t have an index on the timestamp column. Creating this index increased performance somewhat.

On my dev machine this solved the problem, query speed increased from seconds to a few hundred ms. The dev machine has a little over 100.000 events in the database. On our acceptance environment the performance is still very bad… The acceptance environment holds over 5.000.000 mln events.

After I did some more digging. The sort statement makes the query slow. This is quite logical because it has to sort all the records before it can select any of them… Do you have any ideas on improving the performance. For the current replay scenario the sequence in which the events occur does not really matter so we could create our own even visitor. However this would not solve the problem but only provide a work around for now.

What would you suggest as a good move?

Moving away from mysql for our event store might be an idea right?

Thnx.

Allard · June 25, 2013, 7:03pm

Hi Stein,

for replays, the ideal index is one that combines (in that order), the timestamp, sequenceNumber and aggregateIdentifier. I did some benchmarking with this method, and it gave good results on 30mln events. There was no notable decrease in the first part, compared to the last part.
If you use criteria, it might be worthwhile to check if the index is still used. Once the database believes an index is not suitable anymore, it will revert to table scans. Meaning disk sorts… meaning… sorting … is… very… slow…

Just wondering, does the acceptance environment have 5mln events, or 5 trillion? If it’s the latter, I would definitely consider moving away from MySQL. Otherwise, I don’t think it should be a problem. I know of some users on this list that have over 600mln events. Together with them, I am working on an Event Archive, which is basically an event store that backs another Event Store and is optimized to hold the “old” events. It is mainly used for replays, but also allows for reasonably fast reconstructing an aggregate. That way, the “operational” event store only needs recent events and can stay lean and mean.

Further away on the roadmap is a (commercial) event store implementation that can scale up to billions of events easily and is very fast at appending, and reasonably good at replays and event stream loading. The first PoC’s have already been done. I am pretty confident it will outperform any database backed event store, simply because it doesn’t have to do the things that most databases do. If you’re interested, let me know.

Cheers,

Allard

Stein_Welberg1 · June 26, 2013, 9:03am

Hi Allard,

Thnx for the reply. This index fixed the performance issues we had.