Deadletterqueue replay scalability

I started playing with DeadLetterQueue functionality. I observed the failed events got placed correctly in dead_letter_entry ( using postgres for persistence ). Then I went ahead trying SequencedDeadLetterProcessor to reprocess failed events / entries from dead_letter_entry table. I am using Springboot3 @scheduled notation to keep retrying processing the failed events from dead_letter_entry, though only 1 event gets processed at a time / with single invocation of scheduled method. Is this the expected functionality? Its very slow to process from dead_letter_entry in case large number of events got failed during initial executions. Thank you for your help.

eventProcessingConfiguration
          .sequencedDeadLetterProcessor(processingGroup)
          .ifPresent(SequencedDeadLetterProcessor::processAny);

Hello Deepak,

You’re right; the SequencedDeadLetterProcessor only processes one sequence at a time. This sequence can contain up to 1024 events though (by default), which is why it will only try one at a time.
DLQs are not really meant to have this scalability issue, since they shouldn’t have too many messages in them. This is why we limit it to 1024 sequences (again, by default).

Note that the method returns a boolean. If a message was successfully processed, you can immediately retry again instead of waiting for the next @Scheduled invocation. You could also keep checking the DLQ whether there are items left in the queue (be aware of infinite loops though).

I hope this helps

Thank you Mitchell for sharing the insight, this is helpful.

Hey @Morlack, do you mind further explaining what you mean by

DLQs are not really meant to have this scalability issue, since they shouldn’t have too many messages in them. This is why we limit it to 1024 sequences (again, by default).

please? :pray:

Thanks!

Of course! I mean that you should put monitoring on the DLQ size (by exposing a micrometer Spring Boot metric, for example), so you can resolve the issue. If you just let messages accumulate endlessly, there is no point in the DLQ; use a LoggingErrorHandler instead.
If a database or an external system is down and failing, at a certain point, there is no point in queueing the messages anymore; any additional ones will fail. The DLQ is meant for situations where some messages have errors, and others do not.

This is what I mean by they should not have too many messages in them. You should resolve issues quickly, and if 1024 events fail within a time period shorter than you can react, there are bigger forces in play.

I understand! I knew the usage of DLQs has implications but that’s an investigation ticket in the backlog and it was easier to just ask you here! haha
Thanks again!