[axonframework] Custom event sourcing with additional event exploration features

Nicolas_A_Berard-Nau · November 9, 2013, 4:46am

Hi Christophe,

A lot of interesting questions. First, a few observations:

Exploring the event log directly from the source is not something I would recommend. For starters, you will have to adapt your event store for reading OR limit the usability of that feature. What if you want to search through the events ? Surely, you can’t do a “like” on a blob of serialized Java objects and/or JSON. You will also have to apply any future upcasters in the application that reads the events. To me, this is a very big smell.

As such, I would recommend the standard approach, which is to use a projection (i.e.: a read side model). I think one of your best bets would be to index your events using a full-text search engine, like ElasticSearch (which is backed by Lucene) and index fields like the aggregate identifier and the sequence number.

Re-writing the event log should, in almost all circumstances maybe except regulatory requirements, be a big no-no. The event log is the absolute source of truth. Messing with it can have very dire consequences, many of which may reveal themselves when it is too late.
Something I would also like to stress (and which is implicit in my two previous observations) is that the pure, unprojected events are not meant, by design, to be used as part of the read model. The entire premise of CQRS is the separation between read and write models. As such, you should not design your requirements for the event store based on how the data will be used, but rather on the latency, durability, scaling and storage requirements of your WRITE side (e.g. the C in CQRS).
Although I am by no measure an expert in your domain, which means this observation could be entirely wrong, it seems strange to me that the project would be THE aggregate root. Since the write side needs to re-create the aggregate every time a command is processed, having very large aggregate roots in not recommended. Even though most of the entities in your application are probably completely meaningless without the “Project”, which is a big hint that they should be part of the “Project” aggregate according to canonical DDD, modelling purity must be balanced with effectiveness.

And to answer a few of your questions:

It is entirely possible to replay events, as long as your event store supports it. Afaik, the JPA and the Mongo implementation support it.
Depending on your scaling requirements, which seem very small, I would probably recommend the Mongo event store, which requires absolutely no configuration out of the box and supports replay.

And some questions:

What do you mean by “recreating” aggregate roots ?

Hope this was helpful

Christophe_Porte · November 10, 2013, 7:33pm

Hi Nicolas,

Thanks a lot for this valuable answer! Seems we missed a lot of DDD core concepts.
Totally forgot about the potential desire of search on events, so yes, having them also in a read model makes perfect sense.

About the aggregate root, your comment seemed strange so I re-read some DDD resources, and I think it’s clearer to me now. If I understand correctly, AR must be small. And if, as in my case, everything is related to a project, it’s still possible to have some children entity that are also AR, but not directly referenced in the project AR, right ? The separation must be done by the “transactional atomicity” criteria.
For instance, let say a project is composed of systems, and those systems are composed of elements. As long as I don’t need to update at the same time a project and a system, it can be both AR. Right?
Two questions raised in my head as my understanding of light ARs grown:

If we decide that “projects” and “elements” are separate ARs, but we want to limit the number of elements per project. Is it ok to handle the “addElement” command in the project AR, do the validation there, and lunch a new command that will be catched by the “element” constructor?
As ARs are used to validate the changes to the model and not in the views, does it make sense to not store some data in the AR if no validation must be done on them, like a textual description a user could write, and just transmit the text in the event so it can be used in the views?

Concerning the “recreation” of AR, the use case is when a user made some big mistakes in the past (say 2 weeks before) and wants to recover the project in the state it was 2 weeks ago. This project should be a new one. A kind of project copy using the state of a project 2 weeks ago. I must admit we didn’t think about how exactly to do that now…

But I think all this is more related to DDD than Axon, so thanks again for your really valuable answer, we moved forward a lot!

Christophe Porté

Allard · November 11, 2013, 8:45am

Hi Christophe,

aggregate design can be quite complex, but it looks like you’re on the right way. The primary question for designing aggregates is “is this an atomic transaction”? If not, two modifications may be split in two different aggregates. There are other reasons to keep multiple entities in one aggregate. Make sure aggregates don’t become too contended. If your system only has a few aggregates, all user activity will revolve around them, and may cause poor system performance.

If you decide to split Project and Elements are separate Aggregates (note that the term AggregateRoot refers to the entity which is at the root of the aggregate), there is no atomic way to guarantee that at most ‘x’ elements are added to a project. In such a case, validation is typically done by querying the application for the number of elements in a project. Alternatively, you could use a “reservation” procedure and Sagas to guarantee that. But beware, that’s a lot of complexity for something that’s very unlikely to happen anyway.

Cheers,

Allard

Nicolas_A_Berard-Nau · November 11, 2013, 10:41am

Hi Christophe and Allard,

I thought a bit about your “rewind history” feature. Maybe Allard or someone on the DDDD mailing list has a better solution. I personally have never seen this question in regards to DDDD or event sourcing.

Say your aggregate receives a few commands and produces events S = { E_0, E_1, …, E_n } where n is the current sequence number of the aggregate and S is the event stream. You could, in theory, return to the state at seq=i if you omit events { E_i+1, …, E_n } when creating the aggregate. I see two major problems with this approach. 1) The event stream, which is supposed to be write-only, is rewritten, which might have unknown consequences, including permanent data loss if an error occurs and 2) all the event handlers on the C and Q side need to have the option to “undo” their changes, since any event might have altered either other aggregates, sagas or Q side databases, even external systems. I don’t see these problems as impossible to solve but you would need to apply very drastic constraints on your system to support this kind “rewinding”. I would definitely not go down this road.

Another approach would be to create another aggregate by reading the event stream of the original aggregate and transform it. This means that for every event, you would need to implement a transformer which makes the event suitable for the newly created aggregate. The reason this is necessary is that it is probably impossible in most cases to use the original events verbatim, especially if they contain identifiers specific to the original aggregate. This seems like a lot of work and a possible maintenance disaster.

A third, maybe more feasible approach would be to maintain a FIFO stack of relevant state mutations on the Q side and maintain a table of necessary compensating actions to “undo” the mutation. For example, let’s say you have a text editor. Relevant state mutating actions A_0, A_1, …, A_n were applied. Going back to state i would require compensating actions C(A_n), C(A_n-1), …, C(A_i), which are in fact “undo” commands. In that scenario, you wouldn’t need to rewrite the event history and using commands for compensating actions seems much more transparent to me. If you need to, you would be able to fork the stack and rewind later to the original state. All these undo/redo operations, which are in fact commands like any other, generate a trace of events which, in itself, is meaningful and valuable.

I have to be honest, none of these solutions seem good enough. I might’ve missed an easier, cleaner and more practical solution.

Best regards,

Allard · November 12, 2013, 8:23am

Hi all,

I’ve been “digesting” this one for a while as well. Actually, I have seen a similar situation a long time ago as well. Back then, I wasn’t too sure what the best approach to this one would be. Since I haven’t had the need to implement this myself, yet, I didn’t have the chance to validate my ideas.

But this is how I’d approach it:
I wouldn’t rely directly on the event store for the historic part. Instead, I would use a query model that is capable of displaying the history of aggregates. When a “fork” has to be made, create a copy of the aggregate (as Christophe suggested), but do so based on information from the query model. The history of the new aggregate at that point is nothing more than "CreatedCopyOfSomeOtherAggregate"Event. So you’re not really “rewinding” history, but creating a copy of a state that an aggregate had at a specific moment in time.

The fact that event sourcing is used, should be abstracted away from this. The advantage that event sourcing brings, is that you would be able to build up new models using past data. I wouldn’t use it directly. I have come across one exception, where a query model was built up on-the-fly using the event store when a query arrived. The reason was that the query model was only scarcely used, and the query didn’t need to execute that quickly. So instead of storing everything beforehand, they chose to build it up on the fly. Theoretically, you could to something similar when building up your history-view. For the UI, this should all be invisible, as it is just an implementation detail.

Cheers,

Allard

Christophe_Porte · November 12, 2013, 6:36pm

Hi,

Thanks Allard and Nicolas for your inputs!

I guess we will use indeed an event containing all the copied data. We have other use cases where a similar implementation is needed (duplicate project, and import project from XML or Excel document)

Have a nice week,

Christophe