Thank you for prompt response. We have collected additional statistics for the database. The most important observation is that decrease in performance happens over the time even if size of dataset does not change much. For example, we have about 200k events in the store and 50k saga instances in the saga repository at the beginning of the test. After pushing load for 15-20 minutes the size of storage increases to about 260k/60k with 2-4 times decrease in performance. If we restart instance without cleaning anything from the store the performance is back to original again for 15-20 minutes before dropping.
Closer to the end of the test we start receiving slow queries reports for INSERT into DomainEventEntry and SELECT from SagaEntry tables. However, according to AWR statistics the actual elapsed time for both queries is close to 60ms on average. The top wait events according to AWR are:
The row lock contention is for queries related to Quartz scheduler we use in Saga to schedule events and I don’t think there is much we could do about it as it is part of Quartz design to coordinate the scheduling between multiple instances.
The “log fine sync” issue is because of lots of frequent COMMIT and there are two ways to optimize it - either optimize redo logs writing performance or make less COMMIT. The last option was the reason that I asked for JDBC batching support in Axon. By the design of application we have lot of simple aggregates with relatively short lifecycle. As a result there are lot of domain events generated often and as each of them being persisted right away we see a lot of commits.
Nevertheless I does not explain why performance restores after restart of application even though same set of data is present in store.
Regarding my last question I think I was not really clear. My goal is to collect multiple event of same type but of different aggregates over some kind of sliding window (or just simple buffer) and then use one thread to process all of them in batch. Right now Axon calls each of the handlers with one event but I wonder if there was a way to specify that I want handler to receive list of all events that were not yet processed. Similar to how Disruptor uses endOfBatch flag.