I’m studying the possibility of using Axon to implement a feature where changes in the (distributed) system state can be recorded and then replayed according to original time scale.
The original time scale is in the sense when frames recorded on a video tape are replayed with the speed they were recorded (hence, “frame” analogy – see below). This feature can be used for investigation (of previous incidents), simulation or training purposes. For example, think of a multiplayer game with some backend system and frontends (which are not necessarily dummy views and can also constitute part of the distributed system state).
“Frame”
The first obvious issue is the conceptual meaning of event in CQRS/ES. For the sake of clarity, I’ll use a different term “frame” for analogy to frames in video playback (as in “frames per second”) to disambiguate the meaning. Term “event” can be used strictly in its CQRS/ES meaning.
In context of CQRS/ES, event log is only used for re-storing states of aggregates from associated event store. My case is clearly different in that aspect: instead of restoring state of aggregates immediately (as soon as possible) from their event log, I want to use sequence of frames from frame store (analogous to the term “event store”) which reside on a playback server and apply them progressively on aggregates in their chronological order using original time scale.
I’m not saying that “frames” cannot be implemented by events (and mean the same thing). All I’m trying to do is to separate the meanings. Although events seem like the immediate candidate to model “frames”, I already have (in my head) possible implementation scenarios where “frames” may well be some special cases (special by additional metadata or completely different classes wrapping the original events).
External systems = outside question
I’m aware of problems related to replay anything on production system and accidental exchange with real external systems. Let’s assume the playback is done on another (isolated) instance of the system with all Gateways closed for outgoing messages and incoming messages from external systems are also simulated via this playback.
Processing order = outside question
Another problem is frame processing order in such distributed environment. Depending on implementation of the system, the order may be affected by differences in performance, latencies, thread scheduling which is generally nondeterministic. As the result, the playback may record frames in one order while each individual node may processed them in different. When such playback is used as source for “frames” in isolated system, the order reflects recorded rather than original real time.
Any suggestions?
All I want for now is to few ideas and right directions to dig and try deeper.
Is there existing infrastructure in Axon for this case?
What I can see is the classes like ReplayingCluster:
However, ReplayingCluster is only intended to be used to rebuild view models (not state of aggregates):
https://groups.google.com/d/msg/axonframework/Azkao_xY0hE/ap-DFfwwaVEJ
“One thing to take special care with is to never replay your events on the EventBus. You’re very likely to have handlers there that don’t support replying, such as saga’s. Replaying them would cause commands to be generating, changing your application’s state, instead of rebuilding it.”
“In my design, a single Event Handler is responsible for updating one or more related tables. If I want to rebuild these tables, I clear them and replay all events from the event store on that single handler.”
In my case I think of a central playback server which feed “frames” at their delayed right time (not immediately).
So, using some sort of “frame bus” where playback is fed to their consumers sounds like the right solution against all the warnings.