CQRS/ES and GDPR

Before going into the subject, I must say that I’m fairly new to both subjects.
I runned into some conference talks about ES while trying to find a solution for a problem that I was working on. That got me so intrested into ES that i kept digging futher and futher into the subject. With everything I found I got even more intrested into it, until I runned into the GDPR art. 17 discussions. Apperently everyone was trying to find the right solution around 2018, but after that it seems to be rather silent.

Solutions I found so far:

Pseudonymisation of the Personally Identifiable Information.
Pro’s:

  • Beeing able to delete the data without touching the events in the Event Store.
  • We can easily substitute/anonimise the PII data instead of deleting it.
  • Beeing able to regulate the data that requires more security (people can see the event but not the PII data).

Con’s:

  • No single source of truth
  • What happens when additional data is beeing considered as PII? How will we handle this? Those fields are inside the event (instead of the additional datasource).
  • Need to completely replay all events for all projections to be sure all the data is gone (unless we join the additional datastore on the projection and don’t put this data directly into the projection).
  • Nit beeing able to replay all the events once data has been deleted from the appended datastore. (For example when we want to build new projections from scratch).
  • Not beeing able to recover the state in the past.

Crypto-thrashing
Pro’s:

  • We encrypt the PII data right away, adding an additional security layer, if we keep the encryption key in a different store.
  • The least intesive task (just throw away the key).

Con’s:

  • Not all countries/laws agree that encrypting + throwing away the key is the same as deleting data. Reasoning is because the key could be recovered/bruteforced at a later date (think about quantum computing). The data is not really gone.
  • What happens when additional data is beeing considered as PII? How will we handle this? Those fields are not encrypted in the event, so we cant just throw away the key.
  • Need to completely replay all events for all projections to be sure all the data is gone (unless we store the data encrypted in the projections too, but then we would have to share the decryption key too…)
  • Not beeing able to replay all the events once the decryption key has been removed. (For example when we want to build new projections).
  • Not beeing able to recover the state in the past.

Delete or modify the events directly in the event store
Pro’s:

  • None

Con’s:

  • Events can no longer be immutable, because of this we can longer cache them,
  • Not beeing able to replay all the events once data and/or events have been removed. (For example when we want to build new projections).
  • Not beeing able to recover the state in the past.

Replay the events and filter out the PII data in the new stream. Delete the old stream after completion.
Pro’s:

  • We can keep the events immutable (compared to just deleting or modifying).

Con’s:

  • Not beeing able to replay all the events once data and/or events have been removed. (For example when we want to build new projections).
  • Not beeing able to recover the state in the past.

The thing that made me fall in love with CQRS/ES is that you are able to replay all the events up to a certain point in time and get the exact state that it used to be.
We could create new projections and fill the data by replaying all the events.
We could generate reports as if they where runned on a certain date and time.
Sadly enough all 3 situations are affected by all the solutions above and because of that I’m not really sure if using an Event Store is still as valuable as it looked like when I started reading about it. I really want it to be as valuable as before… But it feels like art. 17 from the GDPR law gutted the best parts of using CQRS/EV.

Perhaps its still the best solution as we speak, but I really wonder how other people are dealing with this loss? Do you still generate reports in the past? What do you do with the streams that are effected by this? Would you just leave them out as if they never where there in the first place? Would you still accept them for reports? So you could say like 5% is unknows due to GDPR art. 17? Would really like to get some feedback on how people are handling the effects of the solutions above.

This post seems to be a perfect reply to the topic I created: GDPR compliance & Axon Framework. Maybe a moderator can merge the threads?

Sure. I can merge the topics if @Daxyhr is OK with that.

Sure if that helps. I still hope to find people that can give insights on how they handle these con’s. Perhaps its easier if i rephrase my question. Looking at AxonIQ’s solution (I base my assumptions pure on the freely available information).

Lets say you encrypted fields with the GDPR module. You are filling your eventstore with encrypted fields everything seems to be ok. People request to be forgotten, you remove the encryption key. Everything seems still fine. Until the govenerment decides that field XY which was not encrypted all of a sudden also is considered to be personally identifiable information? This already happend before with things like IP adresses? Which where not considered to be personally identifiable information at start, but ended up as it anyway.

What would you do in that situation?

Hi Daxyhr,

there are a few things to realize when using crypto-shredding. First of all, it is not a one-that-addresses-all-problems kind of solution.
Event Stores are not designed to remove data from them. That doesn’t mean it isn’t technically possible (unless really using write-once media). In case of Axon Server, it is technically possible to alter data, it’s just something you wouldn’t want to do continuously and at runtime.

That’s where crypto-shredding comes in. It makes removing a very specific subset of data from the event store very cheap. That means you can do it instantly and on request of the user. The GDPR actually accommodates for crypto-shredding in several articles. The important thing to realize is that while it’s shredded information, it is still personal information. If you have a data breach, you will still need to notify the authorities. In this case, you cannot identify the individual anymore (because the information is shredded), so you don’t have to notify that individual. That’s how the law states it.
As a long-term solution, this may not be the most reliable way of erasure. Opinions vary, since reverse engineering a single value will take a significant amount of time (more than a human lifetime), even when using hundreds of quantum computers. But anyway, this is where you can -once in a while- go over the stored events to replace all the crypto-shredded values by their removed values (which may mean actually removing, or calling all customers John Doe, for example).

I do want to note, since this post focuses on article 17, that there are many other articles in GDPR too. Event Sourcing in general, and the Data Protection module specifically, provide features that are important to many of those other articles as well, such as being able to trace what has been done with values, which systems have access to these values, etc. Encrypting these values allows you to selectively provide access to these values by services, without limiting these services from passing the values around. Especially in a microservices environment, this is extremely valuable. The ability to erase is almost a convenient side-effect :wink:

2 Likes

First of all I agree with @allardbz. Event Sourcing helps a lot implementing GDPR compliant applications as the application state is built upon events and you get a complete and correct audit log for free.

Unfortunately there’s no free lunch and article 17 is the most challenging GDPR part.

In my opinion this article is challenging for every software system and to be honest: Event Sourcing is not more appropriate than CRUD based solutions, but is it less appropriate? Not in general.

First of all we could have an explicit event which indicates that art. 17 has been applied. Based in this information we could update all our projections and we explicitly modelled that the personal data is anonymized or
removed.

The challenging part is the event store. Do we know up front which is information is considered to be “personal information”? I don’t think so. With the crypto-shredding approach we place bets on what needs to be encrypted.

I fear we need a solution to modify or delete existing events although these operations have their own drawbacks.