Changing the aggregate model


I want to be able to split an aggregate and associated events into two… so that I will end up with two aggregate roots and two events. I have tried to do this via upcasting and can split the event into two events easily enough, but I have not worked out how to indicate that an upcast event is for a different aggregate.

Can anyone suggest a process for modifying the aggregate model please?


Hi Daniel,

I’m working through the reverse process at the moment. Upcasters are great for modifying the shape of events – not really helpful when the aggregate boundaries need to be changed. The event stream for a given aggregate is loaded by the event store based on the database schema/columns. Only once the stream has been loaded are the upcasters applied – too late in the process.

My strategy is as follows:

  • Put the system into maintenance mode
  • Run a tool to modify the event data to make the required changes
  • Start up the new version of the application

The data migration tool itself has a few tasks:

  • Discover all related events for the aggregates to be joined
  • Load all of the original aggregate event streams into memory
  • Re-order the events into a new sequence. In our case, this means [all events of aggregate A, all events of aggregate B, all events of aggregates C]. We fortunately don’t have any dependencies between the streams (as would be expected, given that they’re currently independent aggregates)
  • Re-write the serialized data. We store the payload in JSON format and we need to modify/add the new aggregate ID to every event. The migration tool reads the payload into key/value pairs and adds the missing keys, then re-serializes the payload
  • Re-write the aggregate ID and type – all previously independent aggregate streams are now “Aggregate A” events

The tool uses a temporary table to track the work to be performed: one row per merged aggregate with information on the source aggregates, plus a timestamp/completion column.
We only have a couple million events that can be processed in a reasonable amount of time. Since we have a small ‘active’ data set, we could also prioritize migration of ‘recent’ data and bring the system online sooner, but we don’t have an issue with our expected downtime window.

As to your case, it’d approach it the same way, except the migration will be literally the reverse:

  • For each aggregate, determine what the split aggregate IDs are going to be
  • Duplicate each event for the original event, and change the ID to the split aggregate
  • Rewrite the serialized data to reflect the newly assigned ID

You can get as sophisticated as you need to in the data migration tool to deal with any special migrations required beyond that…