Event scheduler philosophy

feijoel · November 29, 2019, 7:19pm

Happy Thanksgiving to all partaking!

I have some questions regarding event scheduling using Axon, and really in event sourcing frameworks in general.

Context: I have an aggregate that is created with a property defining when something should happen. Let’s say the aggregate is LeftoverFood and the property is spoilsAt. I came up with a few ways of implementing this pattern:

A service can run a spring-scheduled cron job that checks a list of leftovers for those that need to be marked as spoiled. (It could also schedule a task for each aggregate, but the cron job is better for scalability and doesn’t sacrifice granularity in this case.) . The service listens for a LeftoverFoodCreated event, adding the target leftovers to a list of leftovers that may spoil in the future. The service triggers a SpoilLeftovers command near the appropriate time, and removes the food on a LeftoversSpoiled event.
The LeftoverFood aggregate can use the QuartzEventScheduler to schedule an event at a future time. I could not find a concrete example for using the event scheduler, but the example for DeadlineManager suggested that the event should be scheduled during a command handler such that it wont be rescheduled during a replay.

I think both of these patterns have benefits and drawbacks in the event souring world. If we consider a timer to be part of the state of the system, the first solution seems more correct. Replaying the events will set the timer to the correct point. The drawback is that the timer will only be set on one worker, and will not persist across JVM reboots. This also requires that all timer-related events are routed to the same instance. In addition, it becomes possible for the timer to go off multiple times since it is recreated during each replay, and the time may be in the past, although this does not guarantee that the desired effect was achieved. The resulting command handler must have logic preventing it from being repeated.

The second solution can be used in a cluster and persists across reboots (using JDBC), but in many ways has been removed from the state of the system produced by events. Replaying the events will not recreate the timer (when created by a command handler), and you’re now trusting that Quartz does its job properly. Further, the timer could go off during a replay since the quartz sub-system is operating on other threads. What issues would this cause? Overall this seems more reliable than using spring scheduler, but doesn’t feel correct in an event-sourced world.

Is there a workflow that allows the Quartz event scheduler to be more “evented”? Perhaps a timer should actually be an aggregate, and part of initialization (after replay) would be to schedule timers that have not yet fired.

allardbz · December 5, 2019, 11:16am

Hi Joel,

first of all, I would not (re)set timers when doing a replay. Typically, timers are part of Sagas, which should never be replayed anyway. Secondly, there should always be a clear distinction between an event that represents the passing of a deadline, and the event representing that a deadline was missed. The latter is always the responsibility of an Aggregate (or whichever command handling component). So the fact that the time at which LeftoverFood is considered “spoiled” is not decided by the deadline, but by the “LeftoverFood” aggregate. If you ate it before it was spoiled, the deadline is nothing more than a nudge…

The challenge the DeadlineManager helps you solve is to get the “nudge” of the deadline passing to hit the correct Aggregate or Saga instance, so that it can decide whether the LeftoverFood is to be considered “spoiled” or not. If the deadline itself marks it as “spoiled”, I’d say that’s a design problem.

Hope this helps.
Cheers,

feijoel · December 7, 2019, 8:33pm

Thanks Allard! I discovered the DeadlineManager after I authored this post, and it seems to solve a lot of logical issues. I believe the framework has a “command plane” and an “event plane”. The command plane is executed only once and is not replayed, while the event plane is executed on replay and when loading aggregates. The DeadlineManager allows the aggregate (or other scope) to set a timer in the command plane, where it can then send commands, apply events, etc.

If you compare a timer to a saga then I can absolutely see why it wouldn’t be rescheduled. My thinking around these may have been incorrect. If I now understand correctly, it is assumed that there are stateful system entities (quartz timers, sagas) in the database that are not reproducible by the event stream. Meaning, if I nuke my primary database and/or move the events to a new system, the events alone will not reliably reproduce the system EXACTLY as it was before. I had previously thought this was possible, which would require the timers, and sagas, to be present in the event stream.