Now that I have moved all of the validation to sagas I am seeing a dramatic increase in speed of the system. The pattern I'm using for my sagas is to start/react based upon domain events and then schedule a "Retry" event in 5ms to being the actual validation piece.
The SagaEventListener for the retry does the following:
- cancel the scheduleToken if it is not null
- schedule another retry event in 5 minutes time
- in a try/catch block:
-- communicate with external source to validate data
-- end the saga if the validation is successful
So I am ending the saga within the try/catch. And the previous schedule will automatically retry on any errors.
In a normal flow this seems to work great. But I like to push things to stress it a little and I found a crack. So by sequentially dispatching the commands that would come from the UI the first validation saga causes a deadlock. If I change the schedule time from 5ms to 2s then it is usually starts after the last command has been dispatched.
So I looked into async sagas and added an "<axon:async />" element to my saga manager. I supplied it my Spring JPA transaction manager and a dedicated thread pool defined as "<task:executor id=".." pool-size="15" />". No other parameters are given to the async element.
The following stacktrace is thrown:
Caused by: org.quartz.SchedulerException: Unable to unschedule trigger [DEFAULT.6da64b5bd2ee-011b7330-9262-4b79-a481-851858a8712c] while deleting job [AxonFramework-Events.event-e8cd543d-3297-4c9a-936b-86a80b2a5ed5]
at org.quartz.core.QuartzScheduler.deleteJob(QuartzScheduler.java:948)
at org.quartz.impl.StdScheduler.deleteJob(StdScheduler.java:292)
at org.axonframework.eventhandling.scheduling.quartz.QuartzEventScheduler.cancelSchedule(QuartzEventScheduler.java:128)
At this point the Quartz scheduler is dead and holding onto resources. Even the Oracle tables are locked. The only way to stop the servlet container is to kill the java process from the system.
I moved everything in the "retry" SagaEventListener into individual try/catch blocks and this at least doesn't kill the Quartz scheduler.
So am I missing something wrt the setup of the saga manager or is it simply best practice to have quite defensive code in a Quartz job?