many times, questions similar to “how do I wait for my event handler/saga step to finish” or “how do I know my data has finished projecting” have been asked on this forum, and others. Some of them are answerable with SubscriptionQueries, but that only works in very limited scenarios, and the question in general remains unanswered (other than the ubiquitous “embrace the eventually consistent nature” ).
I’ve long been dissatisfied with this, and a few months ago, I had an idea on how to fix it, not just for Axon, but for distributed systems in general. I’d like to introduce you to structured cooperation, a concept that builds on structured concurrency, and deals with many of the issues that plague distributed systems, and by extension, Axon.
I would love to get some feedback from the community at large, and discuss if, and how, this would make sense to incorporate into Axon - I honestly think it would benefit greatly. I already implemented a simplified, bolted-on version in the community edition months ago that I can share, but I want to start by opening the conversation and seeing what the community thinks.
I can’t say I don’t agree with you here, @Gabriel_Shanahan! If anything, there’s an effort to be made to simplify this for sure. With AF5 knocking on the door, making a shift becomes the opportune moment.
I have not read the article you’ve shared, but will soonish. Once I’ve done so, I might come back to discuss things in more detail.
For now, awesome work that you’re looking into this
“Soonish” became today. As pointed out, we are working on AF5. Giving our time frame, a renewed saga-solution will not make it to 5.0.0. But, within the life span of AF5, I do expect a replacement or updated version.
How that’s technically dealt with is a concern of the Axon Framework team, obviously. But, it does clarify why my response took a bit; there are other more pressing tasks at hand, since we moved the “Saga Revisioning” to a latter point in our development cycle.
Nonetheless, I am trying to prep myself “for that day,” and thus figured reading your blog series was required, @Gabriel_Shanahan.
On top of mind, I have roughly three concerns with what you are suggesting to do in Scoop, being:
It seems to have its own event store/queue next to potentially another event store. This sounds like a distributed transaction by nature, which is another can of worms I am afraid.
You intend to expose all registered message handlers to, essentially, all services present. My first gutfeel with this is “you are breaking location transparancy” in this way. Which, at least how I view it right now, is the corner stone of a messaging-based system: the freedom to separate your components as they do not depend on one another.
How many layers deep will a saga wait in Scoop for messages to be handled? What if the message it dispatched led to another saga that dispatch a message, and yet another, and yet another, and yet…I think you see were I am going.
So, perhaps you have some reply to my concerns. Maybe I missed something in your description. Or maybe I am generalizing.
If anything, though: I think the argument is compelling. I am just not sure whether it is to restrictive a form in its current state. So, let’s discuss!
Hey Steven, no worries, glad you got around to reading the articles!
I’m not proposing to incorporate Scoop into Axon, Scoop is just a demonstration of the principle so people can look at (and play around with) actual working code. What I’m suggesting is that you implement the semantics of structured cooperation directly into Axon, i.e. from scratch, not using Scoop.
Structured cooperation is a rule - don’t advance to the next step of a handler until all handlers of all messages (events) emitted in a previous step are done - you can implement that rule in any distributed-system-framework (Axon, Temporal, or anything else).
If by the event store/queue you mean the message_event table - that’s all it is, a table. You can (and must!) write to it in the same transaction that persists the event to the event stream. Since semantically it’s basically just a searchable log, you also have the option of emulating its semantics using some other technology than a database (although I’m not sure why one would want to do that).
You’re not required to do that - you’re required to decide if you want to do that. Do you want absolute synchronization and consistent guarantees? Then by definition, you must - you cannot synchronize and make consistency guarantees in a system that does not know its extent (“you cannot make statements about a room if you can’t see the entire room”). However, you’re not required to do that - you can choose a less restrictive way to implement EventLoopStrategy - one such example is “only services that write to the message_event log within a certain time limit will be considered”. Then, you sacrifice consistency, but gain greater availability and independence of components. The reality is that in most real systems, you don’t always decouple things into multiple services (which is essentially what a handler is in Axon) because you actually want them to be functionally decoupled. Often, you just want them to e.g. be horizontally scalable or managed/deployed by different teams, but wish they still behave as if they were a single service, because of how much easier it is to reason about, debug and maintain. In a distributed system, you will usually have subsystems that you want to couple together tightly, and subsystems that you do not want to couple together tightly. Using traditional approaches, you it’s basically an all-or-nothing choice - either it’s all clumped into one service, and then you get consistency and synchronization guarantees, ease of debugging, reasoning, etc., or its multiple services, and then you’re in distributed systems territory and you lose all that and need to deal with it yourself in some ad hoc way. With structured cooperation, different implementations of EventLoopStrategy give you different guarantees, and you can pick and choose in the definition of each handler to use what suits the usecase you need to solve. I would expect an Axon implementation to expose a similar abstraction that would give the user a similar ability to influence the behavior
As many as there are. In normal code, if a caller calls a function, and that function calls a function, and that function calls a function, and so on, the caller just waits until it all finishes - that’s a fundamental property of code that we rely on, which allows you to treat things as black boxes (and what the old school goto broke - this is what the last article is about). Same things happen in structured cooperation - in e.g. Kotlin, if a coroutineScope spawns a coroutineScope that spawns a coroutineScope etc., then the parent just waits for all the children, period. However, naturally, you must implement timeouts (deadlines in Scoop) and other mechanisms, as you would in normal concurrent scenarios. You could also, at least in theory, implement an EventLoopStrategy that would only consider things that are n-deep, but I think you really don’t want to do that.
As for the implementation, I think that this should not replace current sagas/event handlers, as that would be a huge breaking change - rather, it should be a new kind of primitive (e.g. CooperatingSaga and CooperatingHandler) that users have the option of using. In the same way, structured concurrency in Kotlin didn’t replace threads and the primitives of “normal” concurrency, it built an entire new tool next to them.