So, there is a saga event handler, which fires a bunch of commands asynchronously. So, when it fires all commands, it thinks that this event has been handled and that’s it. Not I can only rely on my command handlers to actually finish the job. Each of these command handlers will call a third-party API, which can be down. I have a RetryScheduler configured, so each command is retried properly, but if I kill my server, on startup, it does not continue any retries and that event handler looks successful, while some commands are not executed.
So, I was thinking of removing that RetryScheduler and publish a TransientExceptionEvent inside the command handler, so that I can have an event handler for a retry. I know that when I restart my server it will continue to handle event where it left off, but I just feel like I am reinventing the wheel. Is there a proper way of doing retries, so that it is crash-proof?