me and my team are relatively new to axon and right now we are looking into how to handle exceptional cases in our event processing.
We are currently using the Axon Framework with a PostgreSQL EventStore and now looking into using the DeadLetterQueue feature provided by Axon. Our goal is to follow CQRS strictly and to always be able to split our command and query side into separate applications and maybe even split the command-side into multiple separate Applications with kafka as EventBus.
I want to briefly describe the concrete feature for which we would like to use the DeadLetterQueue: After a user registers to our system an MembershipCreatedEvent gets fired. Now we need to react on that to fetch some Data (customer sales and some other properties) for this User from an external HTTP-Service and feed that into our system. The external Service is hosted by another team and the request might fail in certain cases. Then the other team needs to to some manual stuff and after that a replay is most likely successful. To support this UseCase we would like to display the currently failed Requests in a Backoffice to inform the other Team
I now have a few questions:
Is the DLQ feature generally meant for EventHandler on the command side, because I often read that it is intended to be used primarily on the query side when some projection eventHandler fails.
would you agree that the DLQ-Feature suits the described Use Case or is there a more “Axon-like” way of handling such a case
Am I correct, that if we split the application into command- and query-side, each application has its own dead-letter-processing and dead-letter-table?
Would it be an acceptable approach to return the the dead-letter-entries from a command-side RestController to UI Components in the backoffice?
Axon’s Dead Letter Queue is mainly intended for event processors on the query side—think of projection handlers that react to events by, for instance, updating a read model or calling an external service. Aggregates process events internally, and sagas aren’t compatible with DLQ, so you typically see it used where events trigger side effects outside the core business logic.
2. Suitability for Your Use Case:
For a scenario like retrieving external data after user registration, where the external call might fail occasionally, the DLQ is a great fit. It essentially “parks” the failed event so that you can manually intervene and retry processing once the external issue is resolved. Just make sure your event handlers are idempotent, so reprocessing doesn’t create duplicate effects.
3. Separate DLQ in a Split Architecture:
In a microservice environment where you separate command and query sides, each service (or even each processing group) that uses a DLQ will have its own dedicated queue and database table. This separation ensures that if one service encounters issues, it won’t affect the event processing in another service.
4. Exposing Dead-Letter Entries via REST:
Providing a REST endpoint to display dead-letter entries for a back-office UI is perfectly acceptable. This approach, similar to the AxonIQ Console, allows your team to inspect failed events, troubleshoot the issues, and manually trigger retries when necessary. Just ensure you have proper security measures in place and that the UI clearly communicates each event’s status.
Overall, this method aligns well with Axon’s best practices for handling transient failures, keeping the command and query responsibilities clearly separated.
In addition I would really advise to try Axon Server. Although distributing the events is possible via Kafka, a lot of things can go wrong. And you still won’t have distributed queries, which is one of the greatest features of the framework.