Why not Cassandra or MongoDB with absolute consistency as event store?

matthewadams12 · September 28, 2021, 2:52pm

Hi all,

We’re using Axon Framework without Axon Server, currently with Postgres as our event storage. The article at Why would I need a specialized Event Store? regarding Cassandra states

To guarantee sequence ordering in Cassandra you have to utilize its lightweight transaction feature which has considerable performance cost and, as described in the documentation, should be used sparingly.

and, regarding MongoDB, it states

MongoDB has recently announced support for multi-document transactions which somewhat mutes this point but this new feature currently comes with some limitations. We also have no easy method for pushing new events to clients so that we have optimal performance for processing events.

Finally, we run into trouble when generating sequence numbers for our events due to the eventual consistency in the cluster. MongoDB by design has no cluster-wide ACID transactions which would be required in order to achieve the sequencing guarantee.

Totally naive question, but I’ll ask nonetheless, because I think things have changed since those articles were written. If we were to set our Cassandra consistency level to be ALL or our MongoDB tag-based write concern to a tag value that included all nodes, would that provide enough isolation & consistency to use these database types as an EventStore’s EventStorageEngine?

Further, if we were to use Postgres in a clustered manner, it seems like all of the same issues enumerated above with respect to the consistency-versus-availability tradeoff would apply, according to the CAP theorem.

As I see it, the performance impacts that you’d suffer in a clustered database exist regardless of implementation, and, if you need a clustered database, then you’re forced to deal with that issue. Fortunately, the problem is slightly lessened by virtue of the fact that the event store has append-only semantics, which means that, as long as you can decide which of a set of two concurrent events is considered to happen before the other, you’re ok. If you’re using a granularity of time down to the nanosecond, practically speaking, there is effectively a zero percent chance of truly coincident events (that is, events that occur within the same nanosecond). Your only issue, then, is to ensure that the processing of subsequent events on an aggregate waits until the prior event processing is complete.

I’d appreciate any input here.

Thanks,
Matthew

milendyankov · September 28, 2021, 4:58pm

Hi @matthewadams12

Thank you for this question. I’m sure it’ll spark some discussion. I moved it from #axonframework to #architectural-concepts as it is unrelated to the framework.

Jamie_Isaacs · September 28, 2021, 7:38pm

@matthewadams12 would using Cassandra or MongoDB support optimistic concurrency? Something tells me that would not be possible and would require additional tooling.

matthewadams12 · September 28, 2021, 9:24pm

looking now @Jaime_Isaacs…

Steven_van_Beelen · October 12, 2021, 11:56am

From Cassandra perspective specifically, the main reason there’s nothing like an “Axon Cassandra Extension” is because of this:

Cassandra treats each new row as an upsert: if the new row has the same primary key as that of an existing row, Cassandra processes it as an update to the existing row.

Thus, when several application instances accidentally load the same Aggregate, nothing holds it back from appending the same event from multiple nodes.
With “same,” I mean the aggregate identifier and sequence number combination, by the way.
This scenario thus breaks the desired format of a “single source of truth,” as the chance exists an event might be overwritten by nodes in the system that have concurrently loaded the same aggregate instance.

Now granted, I am by no means a Cassandra expert.
So, there might just be ways around this predicament.
I am just stating the reasoning why AxonIQ doesn’t provide something like this.

And who knows, maybe there are workarounds for this too.
Making an Event Store through workarounds does not sound ideal to me, though.

That’s my two cents; I hope it helps.

matthewadams12 · October 12, 2021, 2:02pm

Fair enough, Steven. I suppose, then, my question should be, then, “What’s the recommended PostgreSQL HA configuration to use with a JdbcEventStorage?” I think the possibilities are as described at PostgreSQL: Documentation: 14: Chapter 27. High Availability, Load Balancing, and Replication. WDYT?

Bert_Laverman · October 12, 2021, 2:53pm

Matthew,
might I add something that appears to be missing in the above discussion? Axon Server is more than just the Event store, it also provides a messaging solution.

To fully support an Axon Framework-based application consisting of more than a single deployable unit (aka a monolith), you generally need multiple components:

You need something to provide the service (handler) discovery
You need something to provide pub-sub for event delivery
You need bi-directional transport for commands and queries, and that preferably supports streaming results for subscription queries
You need an Event Store that supports append-only behavior with optimization for replaying events, events on aggregate id, and events starting at a certain timestamp.

Axon Server is a purpose-built solution for all of this, which makes it a tough act to beat.

Granted, an RDBMS like PostgreSQL is capable of performing everything you want for an Event Store, but it is not optimized for it. As a generic DB, it needs to maintain multiple indexes to support the different search patterns on events optimally, and that will eventually (large/huge numbers of events/aggregates) start slowing down storage. (need to update/rebalance the indexes) Auto-scaling for PostgreSQL is targeted at supporting complex queries and stored procedures, which is not used in the Axon Event Store use case. HA PostgreSQL means you’ll have multiple instances that each need to be able to fully support your apps. An Axon Server EE cluster uses Raft (consensus-based) to provide HA and store multiple copies of your data, and the messaging work can be shared across the nodes.

If you haven’t looked at Axon Server in detail yet, I suggest you register for a (free) Axon Server online training.

Cheers,
Bert Laverman

Steven_van_Beelen · October 15, 2021, 1:54pm

Interesting jump, @matthewadams12!
I assume you’re looking to optimize your Event Store solution then, correct?
Fair and exciting exertion (I mean, we’re doing this at AxonIQ, so I am slightly biased), although potentially a costly road to take.

What I’d put upfront is that consistency is paramount for the storage solution.
In a distributed data store solution, I’d wager that means an event should become available once the majority of instances stored it.
How that’s best achieved with PostgreSQL is, honestly, a little outside of my knowledge sphere.

To provide some form of a helping hand, I think this video from Allard on Event Store’s might be valuable.
It describes the set of requirements from an Event Store to explain why AxonIQ has built Axon Server as a dedicated Event Store.
Even though that’s the pitch, the requirements might still help you out.

matthewadams12 · October 15, 2021, 3:27pm

I’m not actually looking to optimize my EventStore solution, just make it distributed so that database nodes can fail without interrupting service.

I’ve seen the video & understand.

Steven_van_Beelen · October 15, 2021, 3:57pm

I’m not actually looking to optimize my EventStore solution, just make it distributed so that database nodes can fail without interrupting service.

Sure, I gotcha there. Optimize wasn’t the right word for the problem.
Anyhow, I figured knowing the requirements of an Event Store would help to deduce what’s required from PostgreSQL specifically.

As stated earlier, I am not a PostgreSQL expert, so I’d leave the specifics on that to others here if you feel PostgreSQL is the way to go for your Event Store.