I’m playing around with event store transformation API and consider different strategies looking on required additional disk space during transformation and duration of the transformations.
Trying to find a “sweet” spot in a scenario of replacing events, I consider to detect the range to start and stop transformation as close as possible, to minimize the duration overhead of running through segments without need to do so. On the other hand, if a segment scanned, I want to do as many transformations as possible.
Imagine an example in which I want to replace 17 events distributed amount 10 segments out of 300 segments in total. The one extreme would be to run 17 event store transformations and target every event by its global index resulting in 17 copies of the segment file, some of them multiple time (higher disk space requirement, lower duration). The other extreme would be to run through entire event store costing a lot more time but in the end creating only 10 copies of the event files (lower disk space requirement, higher duration).
If the above considerations are correct, it would be helpful on the client side to know, what segment the event is in. Is it possible to query for this information? Or another way around - is it possible to tell what is the global index of the first and the last event inside a segment on the client side.
With this information available, I would be able to create a transformation plan, making sure that the number of transformations matches the number of segment files “touched”.
Does it make sense to think about this? What do you think?