Changing the aggregate model

Daniel_Bell · January 19, 2016, 6:31am

Hi,

I want to be able to split an aggregate and associated events into two… so that I will end up with two aggregate roots and two events. I have tried to do this via upcasting and can split the event into two events easily enough, but I have not worked out how to indicate that an upcast event is for a different aggregate.

Can anyone suggest a process for modifying the aggregate model please?

Thanks,
Dan

Patrick_Haas · January 21, 2016, 5:02am

Hi Daniel,

I’m working through the reverse process at the moment. Upcasters are great for modifying the shape of events – not really helpful when the aggregate boundaries need to be changed. The event stream for a given aggregate is loaded by the event store based on the database schema/columns. Only once the stream has been loaded are the upcasters applied – too late in the process.

My strategy is as follows:

Put the system into maintenance mode
Run a tool to modify the event data to make the required changes
Start up the new version of the application

The data migration tool itself has a few tasks:

Discover all related events for the aggregates to be joined
Load all of the original aggregate event streams into memory
Re-order the events into a new sequence. In our case, this means [all events of aggregate A, all events of aggregate B, all events of aggregates C]. We fortunately don’t have any dependencies between the streams (as would be expected, given that they’re currently independent aggregates)
Re-write the serialized data. We store the payload in JSON format and we need to modify/add the new aggregate ID to every event. The migration tool reads the payload into key/value pairs and adds the missing keys, then re-serializes the payload
Re-write the aggregate ID and type – all previously independent aggregate streams are now “Aggregate A” events

The tool uses a temporary table to track the work to be performed: one row per merged aggregate with information on the source aggregates, plus a timestamp/completion column.
We only have a couple million events that can be processed in a reasonable amount of time. Since we have a small ‘active’ data set, we could also prioritize migration of ‘recent’ data and bring the system online sooner, but we don’t have an issue with our expected downtime window.

As to your case, it’d approach it the same way, except the migration will be literally the reverse:

For each aggregate, determine what the split aggregate IDs are going to be
Duplicate each event for the original event, and change the ID to the split aggregate
Rewrite the serialized data to reflect the newly assigned ID

You can get as sophisticated as you need to in the data migration tool to deal with any special migrations required beyond that…

~Patrick