axon getting progressively slower during data import

Hi,

We have observed strange behaviour during import of historical legacy data into our axon (2.4.6) architecture. Basically what we are doing is importing documents, with all its modification and lifecycle state from a relational database into an Axon eventsourced model. Each document represents an instance of the DocumentAR, its historical data takes on average about 100 commands per AR to send, resulting in about 100-300 events.

Now initially this goes really fast, first couple of documents take a couple of hundred milliseconds. But after about 100 documents it takes already several seconds per imported document and even went up to 10-15s at which point we aborted the import process. We then did another run by importing the same document 200 times, and got the same result: first really fast then really slow.

Disabling the read side completely did not make a difference, so it’s really just a command dispatched to Axon into an eventsource aggregate over a distributed (jgroups) command bus. What was really strange is that it is even reproducible locally, with a single instance of the service using an H2 database!

Looking at some profiling data reveals not much, though i’m not sure what to look for. There was no GC pressure, database calls accounted for less than 5% of processing time, nothing else really stood out except for a huge amount of xstream objects which were all GC’ed when i forced garbage collection.

Any thoughts would be appreciated !

Jorg

Continuing investigating this, I removed the distributed commandbus and replaced jdbc eventstore with volatile event store. I then simulated the import of one document by sending 1000 commands representing its exagerated history. It just looks as if command response time is a function of the number of events that the aggregate is composed of, unsurprisingly perhaps. I know that snapshots should alleviate some of this, and indeed I noticed a performance bump by setting the threshold from 50 to 10 but still it kept degrading over time. Not that i expected response time to be linear but still.

Jorg

Hi,

It turns out that the xstream serializer was the culprit. Both the java and jackson serializers yield constant response time for the duration of our test case. Both were a bit finicky to get going though, i understand now why xstream is the default. Still i am surprised nobody seems to have run into this before.

Jorg

That seems strange. Did you get any indications of what extra work the XStream serializer was doing near the end of your test? We use the XStream serializer in our application and haven’t observed any progressive slowdown in our “send tons of commands and events” load testing.

-Steve

Not sure, all we observed was tens of millions of objects being created by xstream. This might have something to do with it. Perhaps it doesn't like some of the dto's that my AR is holding ?

Jorg

Hi Jorg,

XStream doesn’t have the best performance of all serializers out there, but at least it will serialize everything you throw at it. Howmany commands are your concurrently sending? Did you consider using a cache? Especially in an import, you can expect the same aggregate to be hit many times in a relatively short timespan.

Cheers,

Allard

We’re sending about 400 commands, most of them synchronous. I admit i have not looked into AR caching, did not realize this was possible in Axon. Going to have a look now.

Jorg