Unit of work perfomance and diagnosing

Yannick · October 10, 2019, 10:49am

Hello

I’m having some trouble with the performance of axon and especially the unit of work is slow. In debug mode i’m seeing this:

12:29:31.837 [http-nio-8090-exec-10] DEBUG o.a.m.u.MessageProcessingContext - Adding handler org.axonframework.messaging.unitofwork.UnitOfWork$$Lambda$1703/0x0000000800f0ec40 for phase ROLLBACK
12:29:46.766 [http-nio-8090-exec-8] DEBUG org.axonframework.messaging.Scope - Clearing out ThreadLocal current Scope, as no Scopes are present

That’s about 15 seconds

After applying snapshots every 2 events I got it to this:

12:39:02.340 [EventProcessor[aanvraagdetailprojection]-0] DEBUG o.a.m.u.MessageProcessingContext - Adding handler org.axonframework.eventhandling.TrackingEventProcessor$$Lambda$1846/0x0000000800f20040 for phase PREPARE_COMMIT
12:39:02.348 [pool-6-thread-1] DEBUG org.axonframework.messaging.Scope - Clearing out ThreadLocal current Scope, as no Scopes are present

8ms which is fine

But it feels like a work around. Obviously it has to do something with parsing all the events. Is there any way I can measure and diagnose this?

allardbz · October 11, 2019, 6:26am

Hi Yannick,

from these log lines, there is not much you can conclude. The Unit of Work is a mechanism that is used to ensure activities that need to be performed while handling a message (any type of message) are performed at the right time. The performance issue is most likely in these activities themselves, rather than the Unit of Work.

The second timing seems to be a Unit of Work used by a Tracking Event Processor, which handles an event. If the first one you logged is from the Command Handler, then is makes sense that that one is slower. By default, to handle a command, Axon will load all events from an aggregate, apply them on an “empty” aggregate instance and then route the command. If you have a large number of events (that also actually need to be used), then that may take a little while.

If you want detailed information about the latency and throughput of handlers, use the axon-micrometer module. It will expose a lot of metrics that you can use to identify badly performing handlers.
See also https://docs.axoniq.io/reference-guide/operations-guide/production-considerations/monitoring-and-metrics

Kind regards,