Retry an event that sends a command to another microservice which might be down (NoHandlerForCommandException - non transient)

uhm_dunno · June 14, 2019, 7:41am

Hi,

I have a Saga in a microservice that sends a command over the commandGateway to another microservice, both of which are connected to axon server and it works just fine. But if I shut down the service that contains the command handler, i get a NoHandlerForCommandException. I can’t use a RetryScheduler to retry sending the command, because the exception is non-transient. I looked at RetryErrorHandler for the event, but that also only works if the exception is transient.

What would be the best approach here? Since this seems like a pretty basic problem there should be some kind of out of the box solution within the framework, no?

I want the Saga to retry sending the command, so once the other microservice gets back up again, it will eventually receive the command.

Would really appreciate some thoughts on this

Steven_van_Beelen · June 27, 2019, 8:00am

Hi Joerg,

As a Saga models a Complex Business Transaction with a given life span/cycle attached to it, exception handling is definitely part of what it should deal with.

For non transient exception your hunch is perfectly correct.
I’d configure a RetryScheduler to retry the given command in such a scenario.

If it however an exception you’d throw from your own domain stating some command couldn’t be handled, then you should make your Saga smart enough to deal with this.
You could do this by adding a try-catch block in your Saga, or working with the exceptional state of the CompletableFuture the CommandGateway returns to you for example.
Or, you could have the command handler publish an event stating something has gone wrong, instead of throwing an exception.

However, when it comes to technical issues, the RetryScheduler is likely the best solution.
Currently, the only implementation provided by the framework for retrying, is the IntervalRetryScheduler, which might not be suited for coping with services being down.
Instead, you could duplicate the services for fault tolerance for example.
Additionally, I think an exponential back-off retry scheduler, what is being worked on in issue #1126 might suite your needs.

Other than that, the framework does not provide an event-retry mechanism.
The key take away here is, like I pointed out in my first sentence, that a Saga should be able to cope with fault scenarios.

If you have any concrete plans on what the framework could provide in this scenario, adding an issue to GitHub is always welcome of course.

Hope this shed some light on the situation Joerg.

Cheers,
Steven