CQRS Pattern - How are command and query separated ? How to correctly implement CQRS?

uberSpotz · October 21, 2022, 6:51am

Hello

My question is how the segregation of command / write side and query / read side is meant to be implemented.

Because I have found a tutorial series which bifurcates the services into QueryDomainAServiceImpl and CommandDomainAServiceImpl.

On the other hand I have read the book on Microservice Patterns by Richardson and I have read the Axon Documentary Entry on CQRS.

Neither do speak of a bifurcation of services, as far as I can see.

The description of Richardson is similar to the description of CQRS, microservices and event-driven architecture with Axon. | by George | Unil Ci Software Engineering | Medium

Richardson begins at point 7.2 on page 228 of his book to explain the CQRS-Pattern in a comparison to the API composition pattern and both against an example of a “non-trivial query” by the signifier findAvailableRestaurants(). (He already mentions that in Chapter 2) I am unable to make head nor tail out of Richardsons explanation on what exactly his command query segregation looks like. I can at best guess or I’ll have to analyze his code of the example codebase.

It does not, as in the tutorial I found, bifurcate the service in a “CommandDomainServiceImpl” and a “QueryDomainServiceImpl”. Instead he obviously uses two databases per service (?)

The accompagnying diagram of the medium article is also very apt.

and so is a diagram in a blogpost (The best way to use the Spring Transactional annotation - Vlad Mihalcea) of Vlad Mihalcea to depict read and write side both as replica.

Richardson goes a similar but slightly different way.

In the most pronounced form the concept of CQRS for Richardson roughly adds up to the Optimistic Replication strategy with Eventual Consistencey consistency model implemented in different standalone read-model services which are each subscribed to the domain events published by all aggregates/services their query model needs. Richardson emphasizes that the read-model-services, he calls them read-modules, are predestined for “non-trivial queries” because each of those “read modules” at minimum rate consists of an own replica data store with data from different services instead of a distributed transaction. Further he says a “read module” can leverage further data stores with special abilities such as geodata to, say, find the next food available food store with a certain dish available for delivery at a certain time. He draws a comparison that traditional services would commit distributed transactions also over additional databases such as Elasticsearch: Die offizielle Engine für verteilte Suche und Analytics | Elastic for text-search (although this database obviously is capable of much more)

He doesn’t yet describe to greater detail how a single service looks then. Each normal domain service in his CQRS interpretation, such as in his diagram on page 233, also has an additional datastore for query transactions. But I don’t know if it is a replica of the first database. I don’t know where the differences are if you use a separate data-store for queries. What does the first data store persist then? He doesn’t really describe it. I would probably have to analyze his “food to go” microservice but I am right now still weakened by a bad fever since two weeks now and I still feel weak.

Would you have a description for me how I should correctly build up my services and read-models in order to correctly realize CQRS ?

I will analyze his Food To Go Store example as soon as I have more power again. But maybe even this won’t lead me a valid/generalizable conclusion, so I think it would be nonetheless not at all be in vain if you could kindly describe to me “the correct way of implementing CQRS.”

What about a separate database for queries for each service? If so, what events do I persist in both databases? Let’s speak of the “write-side” and “read-side” each, both at service level and at application level, where the read-side is some kind of replica and where there are, on application level, possibly multiple replica or at least, “projections” I think it is called when you create a replica but filter events or information you need that you keep vs. events or information that you let drop.

On the data-access level, are this all EventStores ?

And if you would implement standalone services with maybe multiple databases, how would you subscribe to the change-events of multiple services?

Whereas even actually the services would normally not relay the events of AggregateLifecycle#apply to a replica event store. I say normally. How could I configure event handlers and event processors such that some third service would also be informed about change-events it is interested in?

I think I let this post for the time being stand as exposition with some question marks hidden here and there and hope that it develops.

Thank you very much!

Yours sincerely
überSpotz

Gerard · October 21, 2022, 7:25am

Without going into the specifics I think CQRS should like this. You have some kind of storage to check if a command is valid or not, which should contain as little information as possible.

For example for a shop, it could only contain the item id’s and the current amount of items in stock. That would be enough to determine of someone can buy an item or not. Based on the validation of the command, something async needs to happen. In most cases this would be the creation of an event.

The query side doesn’t care about the commands, only about the events that happened and makes it possible to query them. If the events also contain customer information, you could for example query the previous purchases of a customer.

In addition with Axon, the command side is also event sourced. This doesn’t have anything to do with CQRS really, but does mean you have a single source of truth. Without this it’s likely the command model and the Query model become inconsistent.

For example there is some glitch and the event is send before the command model was updated. Now someone tries to buy the product, and according to the command side there is still one left, while on the query site there are none left.

I hope this clarifies things a bit.

uberSpotz · October 21, 2022, 9:38am

Hello Gerard Klijs,

thanks for your kind answer.

To not get you wrong: What is saved on the write side and how? Is the write side not saved as an enriched domain event in an event store? And the query side, how is it a replica with eventual consistency only as in your example? Can you explain how this works technically maybe, please? What is persisted how on the write-side (event store?) and how is it “teleported” to the read-side replica? (also event store? subscribing to the same events as the write-side? ) Commands are never persisted, correct? Speaking of data-access and persistence, is there something else used than the *event-*store to persist Aggregate state in event-sourcing + cqrs ? The repository (ORM/JPA) ?

Yours sincerely
überSpotz

Gerard · October 21, 2022, 11:26am

At least for Axon, the command model and the read model share the same ‘eventSource’. The command model will fetch the events it need for building up the aggregate state (optionally using snapshot to make it quicker). The query model will read all events, starting from the beginning, and receiving the events as they are created. It than uses the events to update the persisted model, and optionally to emit to active subscription queries.

uberSpotz · October 21, 2022, 12:41pm

Yes ok. Thanks for making something implicit quite explicit. I had a Snapshotter to configure on my todo list anyway

I think I can interpret the last post as making the implicit assumption that at a data-access level we are talking about event stores explicit.

But I have problems understanding the following sentences and it kind of reads like a contradiction.

together with

Two different models (command- and read- model) that share the same event-store but with two different sets of information with different size:

I think before it makes sense for me to ask further questions it would make sense to kind of clarify these propositions. I mean, it is, without further information at least, kind of contradictory, am I wrong?

Because if you are speaking about different command- and query- models it would imply two different event stores/databases, would it?

Please don’t feel offended if maybe it’s just me getting something wrong!

Thanks for your help.

Best wishes!

milendyankov · October 21, 2022, 1:36pm

Hey @uberSpotz

Let me try to approach this from a bit different angle. So when you interact with a Thing, you typically want to do one of the two things

create/change/delete the Thing
ask questions about one Thing or many Things (which may include everything else somehow related to a Thing)

For the former, you use commands. For the later queries.

On the command side, the Thing is usually an aggregate. A group of objects is responsible for processing the commands according to business rules. It must validate any request (command) and make some decisions before making changes.

Oversimplifying the Thing lifecycle looks like this

create a new instance of Thing
load the existing state of Thing
process command and make a decision
apply changes and store the new state of Thing
notify other components
destroy the instance of Thing

A command processed in step 3 contains all the data provided by the caller. Some parts of that data are essential for the Thing’s (not only current but also future) validation and decision-making logic. Thus they must be part of its state - stored in step 4 and loaded in step 2 the next time a command arrives.

There are two ways of storing the data needed by the Thing:

store the current state (state-stored aggregates)
store every change (event-sourced aggregates)

Typically there is an aggregate repository responsible for storing and loading aggregates. Depending on which approach is used, the repository for Thing is backed by either some DB to store state or an EventStore to store events (changes). Respectively, reading the state in step 2 is either setting the fields of Thing with values from the DB or calling methods on Thing to rebuild the state from every past change recorded in the EventStore.

Other parts of the data found in the command are only needed for querying purposes. So the Thing needs to pass the relevant data to the query side. That is step 5 above. Most often, it will pass the entire data found in the command to the query side, but that is not a requirement. And here is the tricky part - CQRS does not specify how to do that. Thus there are multiple ways, often based on the specific characteristics of given storages, message busses, etc. The most tool-agnostic approach is sending events to which the query side can subscribe.

When the query side receives an event, it updates one or more projections related to the Thing. Those can store data in one or many databases, LDAP servers, files, 3rd party systems (like CRMs or ERPs), etc. Since that data will be only used for querying purposes, it makes perfect sense to store it in the most convenient for the consumer way.

Now, when using event-sourced aggregates, you’ll realize that Thing needs to emit events twice. Once (in step 4) to store the changes in the EventStore, and a second time (in step 5) to notify the projections. The obvious optimization is to make those two react to the same event so the Thing can only emit it once. That is why you’ll often hear people saying that EventSourcing is perfect for CQRS. That is also why steps 4 and 5 are, in fact, a single step in AxonFramework when you use event-sourced aggregates.

Gerard · October 21, 2022, 2:18pm

I don’t really see the contradiction. There could be a lot of information, typically from the command message, which is not relevant for the command to succeed or not. This information will become part of the event, such that it can be used on the query side, but doesn’t necessarily need to be kept in the command model.

The nice thing when the aggregates are event-sourced is that when we need that information in the command model at some point in time, we can easily do so.

uberSpotz · October 22, 2022, 8:34am

Dear @Gerard,
do I understand you correctly that in your interpretation of CQRS there are per service? per aggregate? one write-model and one command-model, the latter being the single source of truth. These models differ from each other. However, you say, that both models “share the same ‘eventSource’.” By event source I guess you mean event Store because an command/event bus or command gateway both can’t persist domain events. Have I understood something wrong?

I really appreciate your help, @Gerard and @milendyankov !

Yours sincerely
überSpotz

Gerard · October 22, 2022, 9:29am

In the case of Axon with Axon Server, both the Command side and the Query side use the same events stored in Axon Server indeed. Where the Command side will use the event store abstraction, thus being able to rebuild specific aggregates easily. While the query side uses the event stream abstracttion getting the events as they arrive.

However there are different options, for example using a relational database as event store, and Kafka for event bus, with Axon.

I’m not sure what you mean with the service or aggregate distinctions. It’s probably wise with a service to either embrace CQRS fully or not at all.

uberSpotz · October 25, 2022, 5:56am

Dear @Gerard ,
may I still come back to that “contradiction” or “not-contradiction.” I could imagine you publish your command model with aggregatelifecycle#apply by another id than the event-model ?

And as such you could, for exampe, get two different models from one event store by

    public List<Object> listEventsForAccount(String accountNumber) {
        return eventStore.readEvents(accountNumber).asStream().map( s -> s.getPayload()).collect(Collectors.toList());
    }

where the accountNumber is different for either the command- or the query model?

On the other hand, there is only one

Best wishes!!

yours sincerely,
überSpotz

Gerard · October 25, 2022, 6:31am

You could, and you should embrace that flexibility. Depending on the queries you want to handle, you could have a lot of different query models. Where some might need all the events from a certain aggregate. But there could be models only based on a few events of the aggregate, or that combine multiple aggregates.