Opinions on how to set up a read model?

Sebastian_Hoss · January 23, 2015, 5:50pm

hey all,

we recently started discussing how we want to model the read-side of our application. Right now we are mostly using a normalized relational schema and we apply views on top of that to return data in a format that one of our clients understands/expects.
Another approach might be to persist the data already in the correct format. This may lead to data duplication if you have to support multiple clients and all of them want (slightly) different data. However it seems that we would “just” trade hard disk/memory space for performance/easy data access. So instead of writing into a normalized schema and applying views on top of that while reading the data back out, we could save the final/expected data format/structure into the database directly. Since we are reading more often than writing data, we are expecting this to improve performance.

Right now, we are thinking about different ways to save this specialized data. The following ideas are floating around here:

Use something like a ‘ReadModelUpdatedEvent’ which contains the entire data structure of an aggregate. Each event handler is then in charge of selecting the correct subset of that data and persist it into the database. Since we mostly working with JSON data, we can use Jackson @JsonView annotation for that (we could even apply those annotation on our ‘ReadModelUpdatedEvent’). The problem we see with this is that we are basically sending and persisting the current version of an aggregate into our event store every time we change a small part of an aggregate. Some of us fear that this will grow into a problem as our application continues to be used and we add more and more of those big events.
The ‘ReadModelUpdatedEvent’ only contains the ID of an aggregate. Inside an event handler, we are injecting and using a Repository to load the current version of an aggregate. Again we have all the data and the event handler can decide on its own what to persist and how. The problem with this is, that the returned aggregate is actually not the current version - it’s at the previous version because current event handling has not been completed yet. We tried working around that by using the current unit of work but it failed. In the end we need to have some way to access the latest and greatest version of an aggregate included whatever changes might come in by the current command that is being processed. We couldn’t find any, so we came up with solution 1) - writing the current status of an aggregate into some event and only work on that event using the normal event handling services.
Instead of having a big and generic ‘ReadModelUpdatedEvent’ we could invest into more fine grained events like ‘ReadModelOverviewUpdatedEvent’ which signals that just part of the data has changed and thus only requires parts of the complete aggregate. Therefore it’s not as big and won’t trouble our event store as much. However this might be problematic if we have a lot of clients which all might require (slightly) different data. In that case we would basically send one event for each client. This might result in a lot of boilerplate/repeated code.

Have people in this list worked on something similar? Thought about similar problems/solutions?

Greets!

Allard · January 24, 2015, 1:00pm

Hi Sebastian,

exposing aggregate state in an event is very dangerous to do. It effectively ties your models together, causing many of the benefits of using cqrs to be lost. In fact, the domain model on the command side should be completely agnostic of any read models or ui’s.

Make sure your events have business meaning: AccountCreated, PurchaseConfirmed, PaymentReceived, etc. In your event handlers, you map these events to updates in a read model. Make sure these events do not reflect the internal structure of the aggregate. A product owner should be able to understand them.

Your read model should be optimized to provide the information required by its users. In some cases, I even store the plain json in a query store ready to send to the ui on request. It’s very fast, but makes updating slightly more expensive.

On the command side, your model should be optimized to make the correct decisions (i.e. which events to apply) based on incoming commands. Quite often, you don’t really need a lot of scructure for that. That an aggregate shouldn’t even have methods that expose its state.

Cheers,

Allard

Sebastian_Hoss · January 26, 2015, 3:50pm

Hi Allard,

thanks for the help. Just to clarify: I actually want to store plain json in a query store. However i’m unsure on how to achieve that. We have multiple clients who to want to access the same entities but require a different data structure or just require some parts of the same data. So the idea was to send the current state of an aggregate to every interested event handler. That handler can then decide on its own, what to persist.

I’ve created a small gist at https://gist.github.com/sebhoss/26e8d3341455f0145fe4 that shows how our system is currently set up. In general our commands & events can be understood by our product owners. However i’m not sure about “Make sure these events do not reflect the internal structure of the aggregate”. In my small example, i’m exposing the exact structure of the aggregate in the event. How would you rewrite my example to accomplish your goal? How are you normally setting up your query store?

In my example, i’m sending an additional ‘PersonChangedEvent’ which contains the current state of the ‘Person’ aggregate once the ‘SetPersonNameCommand’ is triggered. My event handlers that can all just listen to that single event (per aggregate) and update its relevant read model accordingly. I could live without the extra event, but then i have to handle the regular ‘PersonNameSetEvent’ in my ‘PersonService’, load the current state of the aggregate inside the handler, update it and write it back to disk. Then i have to repeat that for basically every possible business command/event in that aggregate. Isn’t it simpler to just move a copy of the current state of an aggregate with an additional event to the event handlers and let them do their work however they want?

Greets & thanks again!

Allard · January 29, 2015, 4:00pm

Hi Sebastian,

with “do not expose internal state of the aggregate”, I mean that you should base the contents of the events on what’s functionally relevant, not on how the aggregate happens to be structured. When you send a “ChangeNameCommand”, you expect a “NameChangedEvent”. That event would only contain the ID of the person and the new name. In some cases, you might want to send the old state as well. In other cases, where you “add” or “substract” from a state (like number of available items in an inventory), you might want to have an “ItemsAddedToInventory” event that contains the number of items added, as well as the currently available number of items.

On you query side, you could consider storing multiple views of the same data for the different clients. Each view is updated based on the events relevant for that view. Another option is to use a filtering mechanism to filter out irrelevant information when a client only needs summary data.

Hope this helps.
Cheers,

Allard