Axon 3 scheduled events: maybe Quartz isn't needed?

Doing performance profiling of my application, I’ve found it spends a sizable amount of time interacting with Quartz since it makes heavy use of scheduled events.

The thought occurred to me that for any Axon 3 app that uses tracking event processors, Quartz could be superfluous. The event processors could limit their event log queries by timestamp and not examine any events with timestamps in the future. Then scheduling an event would just be publishing it normally, but with the timestamp set to the scheduled time rather than the current time.

Obviously there are challenges here, not least of which is that Axon 3 no longer orders events by timestamp. Canceling upcoming events would also require some design tradeoffs (giving up immutability of the event log or keeping a secondary store of canceled event IDs, etc.) and how this interacts with transactions would have to be thought through. But removing a fairly heavyweight dependency and at the same time getting a measurable performance gain seems appealing to me.

Maybe this idea is completely nuts for some reason that hasn’t occurred to me, but I figured I’d toss it out there.

-Steve

Hi Steve,

I’ve been giving this a bit of thought (while finding the time to reply), and I’m still not sure if it’s nuts or not ;-). It’s definitely an interesting thought, although I see some practical issues when implementing it “into” the regular event stream.
This could be implemented using a copy of the event stream containing events scheduled for the future. On startup, a node can simply check the first upcoming event, and “move” them to the event stream if the timestamp is in the past. This should also work in multi-node environments, as only a single node will be able to actually remove the entry from the scheduled table.

Cheers,

Allard

For what it’s worth, I ended up implementing a DB-backed scheduler to replace Quartz and scheduling is no longer a performance issue in our app. Quartz’s generality forces it into some design choices that aren’t really relevant for the limited use case of scheduled Axon events, and it turns out if your scheduler only needs “schedule a specific task at time X” and you know you’re running on a particular database server with particular concurrency behavior, you can get away with vastly less locking than Quartz needs to do to handle all its different features interacting with each other.

On PostgreSQL you need no explicit row locking at all for scheduling or canceling an event, just for publishing one. And even then you only need to lock the single row for that specific event, not use a global lock like Quartz does. Use a local ScheduledThreadPoolExecutor to do the actual event delivery, which gives you nearly-millisecond-accurate delivery like the in-memory scheduler without having to constantly poll for new events (though you still do want to poll occasionally to detect dead nodes in the cluster).

I don’t think my replacement scheduler is suitable for inclusion in Axon because it makes no attempt to be portable, but if anyone else is running into Quartz-related performance issues, replacing it turns out to be pretty straightforward. It only took a couple days to write it and run it through sufficiently brutal stress tests to make us confident it was at least as production-ready as Quartz.

-Steve