Event sourcing and GDPR compliance

Hi,
I stumbled upon a blog which got me thinking about a potential breaker for event sourcing, I quote the one statement from him (here I the full blog:
https://www.alexhudson.com/2017/10/14/software-architecture-failing/)

"Event Sourcing says you have an immutable log of events, and use that log to create an eventually-consistent view of your application – rather than saving state in an RDBS or something.
This is a classic “we didn’t consider business requirements” type technical choice. I’ve seen two different start-ups now, who hold personal data about customers in their “immutable log”. “How are you planning to handle GDPR requirements and removal of data?” – turns out the answer is often “Er – we haven’t thought about that.” Cue a sad face when I tell them that if they don’t modify their immutable log they’re automatically out of compliance."

What *can* we do when we have customer data in our immutable log and then the customer want to be forgotten? (As is fully in his right according to GDPR)

Hi Viggo,

One of my colleagues here at AxonIQ is actually implementing a module to cover the GDPR requirement.
Although the specifics are not 100% clear to me, I believe it’s a set up with encrypting the contents of an event which are covered by that requirement.
As long as the data is not ‘set to be forgotten’, the key to decrypt the data exists.
If if is set to be removed, then the key to decrypt the data is dropped.
In such a set up we thus keep our event history, but the internal data is not approachable anymore.

Frans, if I’m explaining this wrong, please chip in.

Any how, Viggo, there are thus things under way to cover this issue :slight_smile:

Cheers,

Steven

Dear Viggo,

What my colleague Steven has written is correct, but let me elaborate a bit.

Firstly, yes there is a tension between the event sourcing concept of having an immutable event log, versus the obligation to erasure under GDPR art. 17 and other legal frameworks. To erase data from the event stream, I think there are two main options:

  1. Change or delete events anyway, either by doing this directly in the event store or by replaying all events, filtering them and storing them in a new event store. This is possible, but may be operationally difficult and goes against the immutability principle.

  2. The solution Steven referred to. Encrypt personal data fields with a key that is specific to the aggregate or the data subject, store the keys separately (outside the event store), and delete that key when you need to delete the personal data. The encrypted data will be useless without the key. This approach is known as crypto-thrashing. Gemalto has recently done a blog on this idea in the context of GDPR: https://blog.gemalto.com/security/2017/08/16/deeper-dive-into-gdpr-right-to-be-forgotten/

We know many organizations that have deployed event sourcing in production, and it turns out that ideas on how strict the immutability of the event store is, vary wildly. To some, option (1) may be acceptable, whereas for others it isn’t.

To make is easy for Axon Framework users to implement option (2), we have developed a (commercial) Axon GDPR module that will be released shortly. In essence, it allows you to configure this with some annotations. For instance, you could write something like:

class PersonRegisteredEvent {

@DataSubjectId private String id;
@PersonalData private String name;

}

In this case, ‘name’ will be encrypted automatically with a key identified by ‘id’. By erasing the key identified by ‘id’, ‘name’ will be effectively deleted. We will be sending out some announcements on this module in the coming weeks. If you need info sooner, feel free to reach out to me directly.

Let me also give a somewhat broader perspective. The blog you’re referring to is very negative on several topics. Generally, I think architecture principles like CQRS and event sourcing aren’t good or bad by themselves – it’s up to architects to determine when and how to apply them in a good way.

What we’re seeing is that many organizations choosing event sourcing aren’t doing that as a technical push from development, but exactly because there is a business need to keep historical data for analytics and compliance purposes.

This also holds true for GDPR, which is about much more than just the ‘right to be forgotten’. It broadly regulates under what conditions organizations can process personal data. There always has to be some legal ground, one of them being consent. About this, the GDPR specifically says: “Where processing is based on consent, the controller shall be able to demonstrate that the data subject has consented to processing of his or her personal data.”

Now, suppose you have your typical CRM system with records of customers and prospects that you’re sending email campaigns to. For all records in that system, there should be a legal basis for processing, which may be because you’re doing business with these persons (the customers) or because they have consented to receiving your emails (prospects). If that system is just CRUD and you simple create and update those records, you don’t have registration of that consent.

There are of course multiple ways of solving this, but one especially elegant one is to use event sourcing with meaningful business events. A record could be added to the email list because there was a “ContactRegisteredForMailingListEvent”, for instance. Your event stream will then be the audit trail you need for compliance purposes.

So, although GDPR art 17 is a bit of a challenge, in general event sourcing is an excellent fit for compliance frameworks like GDPR.

Hope this is useful,