Before going into the subject, I must say that I’m fairly new to both subjects.
I runned into some conference talks about ES while trying to find a solution for a problem that I was working on. That got me so intrested into ES that i kept digging futher and futher into the subject. With everything I found I got even more intrested into it, until I runned into the GDPR art. 17 discussions. Apperently everyone was trying to find the right solution around 2018, but after that it seems to be rather silent.
Solutions I found so far:
Pseudonymisation of the Personally Identifiable Information.
Pro’s:
- Beeing able to delete the data without touching the events in the Event Store.
- We can easily substitute/anonimise the PII data instead of deleting it.
- Beeing able to regulate the data that requires more security (people can see the event but not the PII data).
Con’s:
- No single source of truth
- What happens when additional data is beeing considered as PII? How will we handle this? Those fields are inside the event (instead of the additional datasource).
- Need to completely replay all events for all projections to be sure all the data is gone (unless we join the additional datastore on the projection and don’t put this data directly into the projection).
- Nit beeing able to replay all the events once data has been deleted from the appended datastore. (For example when we want to build new projections from scratch).
- Not beeing able to recover the state in the past.
Crypto-thrashing
Pro’s:
- We encrypt the PII data right away, adding an additional security layer, if we keep the encryption key in a different store.
- The least intesive task (just throw away the key).
Con’s:
- Not all countries/laws agree that encrypting + throwing away the key is the same as deleting data. Reasoning is because the key could be recovered/bruteforced at a later date (think about quantum computing). The data is not really gone.
- What happens when additional data is beeing considered as PII? How will we handle this? Those fields are not encrypted in the event, so we cant just throw away the key.
- Need to completely replay all events for all projections to be sure all the data is gone (unless we store the data encrypted in the projections too, but then we would have to share the decryption key too…)
- Not beeing able to replay all the events once the decryption key has been removed. (For example when we want to build new projections).
- Not beeing able to recover the state in the past.
Delete or modify the events directly in the event store
Pro’s:
- None
Con’s:
- Events can no longer be immutable, because of this we can longer cache them,
- Not beeing able to replay all the events once data and/or events have been removed. (For example when we want to build new projections).
- Not beeing able to recover the state in the past.
Replay the events and filter out the PII data in the new stream. Delete the old stream after completion.
Pro’s:
- We can keep the events immutable (compared to just deleting or modifying).
Con’s:
- Not beeing able to replay all the events once data and/or events have been removed. (For example when we want to build new projections).
- Not beeing able to recover the state in the past.
The thing that made me fall in love with CQRS/ES is that you are able to replay all the events up to a certain point in time and get the exact state that it used to be.
We could create new projections and fill the data by replaying all the events.
We could generate reports as if they where runned on a certain date and time.
Sadly enough all 3 situations are affected by all the solutions above and because of that I’m not really sure if using an Event Store is still as valuable as it looked like when I started reading about it. I really want it to be as valuable as before… But it feels like art. 17 from the GDPR law gutted the best parts of using CQRS/EV.
Perhaps its still the best solution as we speak, but I really wonder how other people are dealing with this loss? Do you still generate reports in the past? What do you do with the streams that are effected by this? Would you just leave them out as if they never where there in the first place? Would you still accept them for reports? So you could say like 5% is unknows due to GDPR art. 17? Would really like to get some feedback on how people are handling the effects of the solutions above.