Dead-letter queue and transaction

yudong · February 27, 2023, 2:08pm

Hi,

We want to use Dead-letter queue for dealing with situations like

RuntimeException during processing
Exception at commit time: a value is too large for a database column

I added Dead-letter queue to our application according to axon reference documentation.
The dlq doensn’t work: event handling keeps retrying as before.
Then I examanined the AxonIQ dead-letter-queue-workshop sample project and noticed
that the eventhandler has REQUIRES_NEW transaction.
If the workshop eventhandler transaction is changed to default(REQUIRED), then it doesn’t work either: it keeps retrying.
I suspect that dlq and eventhandling are running in same transaction and dlq data is not persisted,
since the transaction is rolled back.
Is requires_new trx mandatory for an eventhandler to make dlq work?

Yudong

Steven_van_Beelen · February 28, 2023, 1:19pm

Hi @yudong! Welcome to the forum.

The dead-letter queue needs to be able to persist, in a single transaction, the dead letter entry and progressing the tracking token of your Event Processor.
If it is incapable of doing both in a single transaction, you Event Processor:

cannot proceed to the following event because the token wasn’t persisted, or
cannot retry the failed event, since the dead letter entry wasn’t persisted.

So, as you’ve noticed, the same transaction wraps the dead letter storage procedure.
Due to this, your second point:

Exception at commit time: a value is too large for a database column

is not really an option.
If committing to your database fails, you have already surpassed the @EventHandler annotated method.

I suspect that dlq and eventhandling are running in same transaction and dlq data is not persisted,
since the transaction is rolled back.

This is indeed correct. As stated earlier, for event handling to move on to the following event, Axon Framework is simply required to update both your token and insert a dead letter. Otherwise, the DLQ support cannot function correctly.

yudong · March 1, 2023, 2:18pm

Hi Steven,

Thanks for your clarification.

Just a few questions:

Is deadletter replay not batched, even though the ProcessingGroup is configured as batch?
If the eventhandler has requires_new trx and ProcessingGroup is batched, will batch still work? since every event is handled with a separate trx.
According to reference doc a DeadLetter param can be added to the handler method. When I add DeadLetter<EventMessage> deadLetter to the eventhandler method, an exception is thrown: Unable to resolve parameter 1 (DeadLetter) in handler …

Gerard · March 1, 2023, 4:50pm

For your last point, which version are you using? It was introduced in 4.7.0 if I’m not mistaken.

yudong · March 2, 2023, 8:37am

I am ussing 4.6.4. According to release note the resolver is indeed introduced in 4.7.0. I was using the the latest ref doc, since DeadLetter class is available, I thought it was introduced together with dlq.

Any idea if this will be added to 4.6.x version?

Gerard · March 2, 2023, 8:49am

Very unlikely, any reason it’s hard to move to 4.7?

yudong · March 2, 2023, 9:57am

4.7 is linked to Spring boot 3, on its turn it does not support ActiveMQ anymore. Our services use ActiveMQ.

Gerard · March 2, 2023, 9:59am

No, you can use 4.7 with Spring Boot 2.

Steven_van_Beelen · March 2, 2023, 2:08pm

This is, indeed, an incorrect assumption of you, @yudong.
We’ve made Axon Framework 4.7 exactly so that you can choose between Spring Boot 2 and Spring Boot 3. Similarly, you can choose between Hibernate 5- or Hibernate 6+. And between Javax and Jakarta.

We would not make such a massive breaking change for users within a minor release. Ever.
For a major release things are different though.

Dead letter processing is done per sequence.
The batching configured on your Event Processor has zero impact here since a batch of events can contain any number of sequences.

As you may have read in the documentation, the Dead Letter Queue is, in essence, a queue of sequences.
This ensures the event handling order, which is paramount within event-driven systems, is maintained.

Lastly, the transaction is placed around the entire sequence by the DeadLetteringEventHandlerInvoker, to be exact.

What do you mean exactly with “will batch still work?” What do you expect to work here exactly? The transaction scope of the batch, perhaps?

yudong · March 6, 2023, 4:21pm

A certain ProcessingGroup has batch enabled, its eventhandler relies on that UnitOfWork is of type BatchingUnitOfWork. During dlq replay, the type is UnitOfWork. That is the reason for this question.

yudong · March 6, 2023, 4:32pm

The transaction scope exactly. I assume the transaction scope is the complete batch. If an eventhandler must have requires_new( to make it work for dlq), then each event is run within its own transaction, this negates the batch transaction.

yudong · March 6, 2023, 4:35pm

Then I assumed incorrectly.
When I set axon to 4.7, I got some library errors that reminds me of spring boot 3 migration.

Steven_van_Beelen · March 7, 2023, 9:17am

Gotcha, thanks for clarifying, @yudong!
And, agreed: REQUIRES_NEW would force new transactions, negating the batching logic.

In all honesty, when working with the DLQ on my pet project, I did not set any transactional logic aside from the default behavior given by the Framework. So, I’ve never had to set REQUIRES_NEW at all.

Furthermore, we investigated shortly why the sample project you refer to uses the REQUIRES_NEW, but we are not 100% sure at this stage.
This doesn’t mean I’m not willing to dive into this further, by the way, just stated the current state of investigations on the subject.

That said, I’d like to better understand your statement here in your original comment:

So, here are my questions.

You mention the two scenarios you want to use the DLQ for.
Does the undesired behavior occur for both?

Furthermore, can you specify the retry behavior you’re seeing?
So, are you getting any log statements from the Framework?
Or from your own code?
Asking this as I’d like to make as few assumptions as possible at this stage.

yudong · March 8, 2023, 8:31am

Hi Steven,
Expected behaviour is what dlq is intentended for: this event and subsequent event of same aggregate are put aside on the dlq.
I used the Axon dead-letter-queue-workshop project (branch solution step6) as verification.
Two testcases:

A: Runtimeexception during eventhandling
B: value too big for column, error at commit time. I suppose this represents other errors at commit such db constraint error.

Test result with different setting for eventhandler

the sample code as is with requires_new: working both testcases
eventhandler without transaction annotation(as your pet project)
- A: working
- B: keeps retrying:

2023-03-08 09:18:18.171  WARN 6036 --- [product_name]-0] o.a.e.TrackingEventProcessor             : Releasing claim on token and preparing for retry in 8s
2023-03-08 09:18:18.171  INFO 6036 --- [product_name]-0] o.a.e.TrackingEventProcessor             : Released claim
2023-03-08 09:18:26.174  INFO 6036 --- [product_name]-0] o.a.e.TrackingEventProcessor             : Fetched token: IndexTrackingToken{globalIndex=33} for segment: Segment[0/0]
ProductNameChangedEvent 1 012345679012345679012345679012345679012345679012345679012345679012345679012345679012345679012345679012345679
2023-03-08 09:18:26.182  WARN 6036 --- [product_name]-0] o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 22001, SQLState: 22001
2023-03-08 09:18:26.182 ERROR 6036 --- [product_name]-0] o.h.engine.jdbc.spi.SqlExceptionHelper   : Value too long for column "NAME VARCHAR(100)": "'012345679012345679012345679012345679012345679012345679012345679012345679012345679012345679012345679012345679' (108)"; SQL statement:
update product_name_entity set description=?, name=? where id=? [22001-200]

eventhandler with @Transactional: fails for both testcases with similar error as above

_axxelia_Markus · March 9, 2023, 5:18pm

Hi guys,

We see exactly the same problem (Axon SE 4.7.0, MongoDB as TokenStore, DLQ store, projection store).

As Yudong describes, either you leave away the @Transactional on the event handler and then everything, the token, the DLQ entry, the projection entry is persited (the latter is not wanted), or you add it and then everything gets rolled back with the surrounding transaction and the system behaves like without DLQ.

Adding REQUIRES_NEW would help, since then token and DQL is in a separate transaction than the update of the projection (MongoDB collection in our case).

So: is there another option?

Thanks!

_axxelia_Markus · March 10, 2023, 6:24am

Hi Steven,

I’m using the MongoSequencedDeadLetterQueue (see my post in this thread).

Seems like Yudong is right and it’s logical from my pov. Exceptions that bubble up from the event handler would not only rollback the projection update, but also the entry for the DLQ and the update of the tracking token (since that is handled in one transaction).

That’s why REQUIRES_NEW helps, since only the second transaction (which updates the projection) will be rolled back, but not the surrounding transaction for the token store and DLQ update.

I thougth about NESTED propagation but that seems not to be well supported by the Spring Data components - and MongoDB transactions do not have the notion of savepoints (which would be the precondition for this to work).

So it looks like REQUIRES_NEW is the only propagation type that leads this to work, but what are the consequenes?

What would happen, if the transaction for the event handler is successful, but the one for the token store and DLQ fails (the projection record/document would have been updated by then). That basically would require some kind of compensation from my pov (which is not nice at all).

Anyway, maybe you have some ideas…

Thanks
Markus

Steven_van_Beelen · March 10, 2023, 3:57pm

Thanks for going into more detail, @yudong.
Similarly, thanks for chipping in, @_axxelia_Markus.

First commenter

First, let me dive into your pointer, @yudong:

Although I find your wording slightly pushy here, @yudong, I think you confirm that the behavior is as expected for Scenario A.
Wherein I would expand upon Scenario A to describe none-database related exceptions.

So, let me rephrase what I stated in my first reply:

To further ensure all this happens within a single transaction (so, as not to have separated/distributed transactions), you would put your Tokens, Projections, and Dead-letters in the same database.

As you have noticed, issues with the database, like “a value is too large for a database column,” will cause that transaction to roll back.
Although it means the support differs depending on the exception level, this is desirable behavior.

To clarify, you want this behavior because the TokenStore update is done in the same transaction.
Now, if we inserted a DLQ item (for example, in a different transaction) while allowing the current to rollback (as with REQUIRES_NEW), it would keep retrying the event and put it in the DLQ, forming duplicates.

Although using the @Transaction(REQUIRES_NEW) works is, the downside is batching behavior differs and that you need two connections from the pool.

Second commenter

Now, let me jump to you, @_axxelia_Markus:

Please note that your assumption here depends on the type of exception again.
The statement is fully correct when it is database-related and thus impacts the transaction.
When it is not, the DLQ works as intended.

I trust the consequence of this are clearer when looking at the “first commenter” section.

Concluding

During the design of the Dead-Letter Queue feature, we felt it was a necessary evil to use the active transaction.
Simply because you (1) want the token to proceed and (2) insert a dead letter.
If you cannot ascertain this we enter the sphere of problems of distributed transactions.

Nonetheless, I want to point it we take your comments to heart.
If you have any recommendations how Axon Framework may potentially provide support for (as @yudong describes it) both scenarios A and B while ensuring a single transaction is still used for DLQ and token inserts, we are all ears.

After all, it is an open-source project: your contributions are paramount to making Axon Framework the product it should be.
So, please provide your feedback or your pull requests, @yudong and @_axxelia_Markus.

In the meantime, I can assure you we will take note of this.
Perhaps the team here devises a nicer solution for both.

If you have any follow-up comments, be sure to keep replying.
Small personal note: I will be off for a rather extensive period for vacation. I have nudged my team members to watch this thread, though.