Cascaded Deletes (again)

rhubarb · August 12, 2022, 6:16pm

I’m starting a new topic because I can’t reply on Cascade-Deleting related Aggregates which was imported from google groups.

We have a similar case to the one outlined there - so I’ll use their terms.

We have Project (Aggregate) and Image (child but Aggregate) with a command-side projection in postgres to look up the Images from the Projects and vice-versa.

In that discussion Steve suggests a separate EventHandler component.
My follow on questions are:

Why is it better to have a Component than a Saga for this? (That answer was from 2019, maybe opinions have changed?)
Would it make sense to have a CommandHandler component to coordinate the cascading delete or should it really be an EventHandler (or Saga). I ask because the examples for set based validate (Set Based Consistency Validation) show similar use cases, handled by CommandHandlers
What is the appropriate time/place to markDeleted on the parent aggregate? We’re thinking : In the delete-cascade component, after the last child delete command has been issued?
What should we do with the child-deletion events? Should we add an additional EventHandler so that we can check that the last one was deleted before marking the parent deleted? If not, we lose transactionality.

thanks
Chris

Morlack · August 16, 2022, 7:27am

Hello @rhubarb!

Let me go through your questions one by one.

Why is it better to have a Component than a Saga for this?

As Steven outlined in his answer, the lifecycle of such a Saga would be very vague. It’s not really a business transaction with multiple steps, which sagas are often used for, but it’s a single technical transaction.
The component would be a single event processor, listening to Project- and Image-events to update its table of content. Now, when a ProjectDeleted event is encountered it can query his own table, fire the commands and the delete those rows so it’s consistent again.
Now, when we try to do this in a Saga, when does it start? In order to keep all the data, the Saga would need to start when a Project is created, then kept up to date, and eventually ended. It’s certainly possible, but due to performance reasons it’s always good to keep sagas data-wise as lean as possible.

Would it make sense to have a CommandHandler component to coordinate the cascading delete or should it really be an EventHandler (or Saga).

I have seen this exact approach recently, for this same use case. The Project command handler fired commands at the images, which then fired commands at the annotations, and then to the labels. However, with an event processor you can uncouple this huge transaction. Project Delete events lead to images delete commands, and each image delete event leads to annotation delete commands, etc. It’s much more friendly for the performance of your system.

What is the appropriate time/place to markDeleted on the parent aggregate?
We can be very pragmatic about this; the only reason to call this, is to no longer accept commands. So, you should call this the moment you don’t want to accept commands to that aggregate any longer. Reading this, I think that this is the moment the delete command on the parent aggregate has succeeded.

What should we do with the child-deletion events?
I think I answered this in the earlier questions, but expanding on that I think that the child-deletion events should mark the child as deleted, and if needed an event processor can cascade the delete down the hierarchy.

Final notes
Reading your questions, I wonder what the reason was for splitting these two aggregates into a hierarchy in the first place. The aggregate is your transaction boundary. Only in this boundary can you guarantee transactionality. Trying to build this outside of this boundary is hard, and considered an anti-pattern. Outside this boundary, you should rely on eventual consistency, as outlined with the event processor approach.

Please consider to model the child-aggregate as an @Entity inside of the @Aggregate. The model will be almost the same, but much more friendly to work with technically. I think if one of these applies, it warrants a separate child aggregate:

The child has a separate lifespan and process than its parent and has a clear ending of his own. For example, a Customer can have an Order. An Order is aware of the customer and cannot exist without it, but it has a separate process and lifespan. However, deleting a Customer would not involve deleting the Order.
There are performance requirements that, without the split, a single aggregate instance would be unable to handle. This level of performance requirement, however, is very rare.

I hope this helps you!

Mitchell

rhubarb · August 16, 2022, 9:32am

Thanks for the detailed response Mitchell… It helps us a lot.
Just two notes:
1: On the topic of the child-deletion events (ImageDeleted), I was really asking if the component should be listening to them (as well as the ProjectDeleted event) to somehow track progress and ensure all of the children are deleted before the parent. It seems unnecessarily complicated, but it also seems that we could end up in a situation where one of the child deletions failed.

2: To answer your last question, we split these on the advice of AxonIQ. Our types are Asset (parent) and Version (child) but keeping with the the Project/Image example: we have many 1000s of Projects and each one can have 100s or 1000s of Images over time. Most of the commands will be performed against images, and we want different users to be able to update different Images in the same project without interfering with one another.
Making Images components would mean that the Project is locked for each Image update. Hence the parent/child aggregate pattern

thanks again

Jakob_Hatzl · August 22, 2022, 1:03pm

Hi @rhubarb and @Morlack!

I’ve initially asked the question about cascade-deleting in the google group more than 2 years ago and come a log way since then (including working with @Morlack at a similar cascading use-case in a 1-on-1 session during AxonIQ consultancy, which - I think, if I guess it right - he also indirectly references in his post ).

I’m coincidentially at the moment working on refactoring some patterns of the cascade deletion I created back then, so this is all quite fresh in my mind. There are two things I like to mention on the discussion:

Regarding the last question of @Morlack (separate child aggregates vs aggregate members): for us it was also the performance requirement, as 1 project could contain 100s of images and moreover 1 image 10.000s of annotations which would add up to a single aggregate having a huge number of events pretty fast. Since @Morlack you state that this level of performance requirement is very rare, can this be a sign of a bad design/modeling of the domain? And to the whole Axon community: how do you handle large parent-child relationships with Axon Framework differently? Are there other patterns how to model large parent-child relationships properly in DDD?

The second thing is about propagating commands down a tree using a command → event → command propagation: We encountered some serious issues with that pattern.

Every time a command is sent from within an existing unit of work, the cleanup phase of that command is attached to the UoW and (if routed to the same application instance) the parent unit of work is only finished when all child UoWs are finished - this blocked the first parent for the whole process. This hit us fully, because we were using only a single aplication instance and CommandGateway#sendAndWait for propagating - but even if you’re using multiple application instances, you cannot know which id-‘segments’ are processed on which instance. We ran out of database connections for the axon framework db connection pool pretty fast because of this behaviour when processing a large tree of objects. To be honest, I think I never fully understood everything that went on in the depths of the framework regarding that part - as of now we refactored that whole cascading use-case and avoid the pattern at all.
there is a 10k limit on the command queue (at least for axon server standard edition we experienced this), so your commands will be rejected if the queue in axon server grows beyond that limit (most probably if the processing side is not fast enough). As a workaround we throttled command sending like suggested here Perfomance tuning initial load from large context - #3 by allardbz
be aware that when sending a large number of commands you need to make sure that the opposing side must be fast enought to process them as well, otherwise you’ll get command-timeouts (defaults to 5min I think) - however I’m not sure if the commands really get cancelled if already fetched by the processing side, or if timeout is only reported back to the sending side from axon server (which i think is true).

I’m currently in the middle of outlining our approach for cascade deleting child aggregates and if I come to a final conclusion would be happy to share it here. Roughly I plan to validate creating child aggregates against a command model (check if the parent exists) and once an aggregate is deleted, mark it and all children as deleted in this same command model to block creating more children of the deleted parent or any child below. Then I can simply collect a flat list of all child entities and send a (cascade-)delete command to each one.

@rhubarb, I am very courious about how you solved the cascade-deletion finally and would be happy if you would share it.

Best Regards,
Jakob

allardbz · September 13, 2022, 8:15am

Hi Jakob,

there are different ways to go about this topic. Purely technical, and conceptual.

Let me start with a technical approach: when sending a command from an Event handler, unless you do sendAndWait(), nothing is waiting for the command to finish. It may be that there is a blocking implementation somewhere of a Command Bus, such as the SimpleCommandBus or even the SpringHttpConnector if you don’t give it an Executor to handle calls asynchronously. So the problems you’re encountering a most likely related to some sub-optimal configuration. Note that when using Axon Server, this will work asynchronously out of the box. You’d only have to avoid using sendAndWait().

But then there is also the conceptual approach. Cascade delete is not a typical business term. You could wonder if it even has a place in an application. But let’s assume there is some business concept that if x happens to a y, then something should also happen to a group of entities that are referred to by y.
But unless we understand what x, y, and the entities related to x are, we cannot give any detailed advice. The most important thing is to log what actually happens. Are those “things that happen to related entities” really things that happen to an aggregate, or are these just view model changes to reflect that y happened to x?
I would recommend taking a step back and trying to describe what happens on a business level. What events make sense on that level? In the end, those are the only events you’d want to see in your event streams.

Hope this helps…

Jakob_Hatzl · September 13, 2022, 10:40am

Hi Alllard,

as pointed out we hit the sendAndWait() in combination with a single application instance that led to blocking the UoW all the way down the cascading tree. With the knowledge of today I would also clearly count that under sub-optimal configuration (or design) as you pointed out . We’ve switched that to send() since then and also switched from nested event-cmd-event to a flat approach which performs better.

On the conceptual level it would of course be more suitable to have all children (in our case “Images” and “Annotations” etc.) within the parent (in our case a “Project”) but at the time of designing our application and with the knowledge we had back then we identified a technical limitation to that for our special case, since we’re potentially having nested parent-child relationships with rather large cardinality (e.g. 1 to 100s to 10s of thousands).

On a technical level we identified the need to keep our related entities as separate aggregates, because of the aforementioned cardinality of parent-child relationships, where modeling children as AggregateMembers would have become impractical for the following reasons (of course based only on our probably limited knowledge of the framework and DDD):

single parent aggregate instances would have got 100s of thousands of events, making event sourcing without snapshotting nearly impossible and snapshots rather large
every command issued to any of the children would have locked the whole parent, making concurrent access on different child levels impossible) Thus we resorted to keep aggregates separate and work with subscribing event handlers building command validation models for set-based validation and deletion on parent-child relationships.

One thing that i find worth reconsidering is your suggestion that we might not need to reflect the deletion of children in the event stream, but simply adjust the view projection in response to deleting the parent in a different way. We have considered that, but since we’re having separate aggregates we would need to introduce some additional check also on the command side for any child entity if the parent is deleted, to avoid access to a child whose parent is already deleted (because without a deleted event the child aggregate would technically not be marked deleted in framework terms).

Anyhow - thanks for taking the time to respond in such detail, much appreciated!

Best Regards,
Jakob

danstoofox · November 1, 2022, 9:59pm

@allardbz do you have any insight on the last statement @Jakob_Hatzl mentioned?

I like the idea of only updating the query side without the need to issue many commands. How could we deal with aggregates that are no longer supposed to exist but aren’t marked as deleted?
Are there any meaningful optimizations Axon performs after an aggregate is deleted?

We also have a situation in our app where a single command triggers the creation of many related aggregates (to prevent the parent from getting too large and allowing concurrent access to children). Currently, we issue many create commands which create the related aggregates. I guess we can’t get around this, right?

allardbz · November 2, 2022, 1:31pm

Hi Daniel,

considering I have a complete lack of domain context here, I will just give my perspective, but leave it up to you to figure out how useful it is .

To me, cascade-delete is a technical concern. For example, when I cancel an Order, I really don’t care how that Order is structured, and whether each order line needs to be canceled separately or not. The thing that most likely happened is only “order canceled”. Nothing more, nothing less.

It sounds like there is an invariant between aggregates here. When aggregate A did X, then Aggregate B shouldn’t do Y. The fact that X is called “cancel” or “delete” (<-- probably not a business term!) doesn’t change the way we should deal with it. Not that “marking an aggregate as deleted” in the framework doesn’t do anything except to throw an exception when attempting to interact with the aggregate.

If that invariant is important, then you’ll need to make sure each aggregate is aware of any relevant state changes. This could either be by issuing a query that contains the current state of the parent aggregate or, alternatively by sending a command to change the state when a certain event occurs.

Hope this clarifies a bit. I had to do some digging, as this discussion was a while ago.

marinsimina · December 5, 2022, 11:43pm

Hello @Morlack ,

Can you provide some info about how many commands/events a single aggregate is able to handle? And maybe what levers are available to control the performance of a single aggregate? Do you have any courses that discuss this issue?

Thanks!