Axon 2 Saga in blue/green environment with kafka event bus

johnd · May 24, 2021, 10:18pm

Hi, for years we have been using Axon 2 Sagas with an event bus that is kafka-enabled*. But now we are going from one to two servers for blue/green deployments and I’m at a loss if there is a way to make this work for one primary reason: how to make sure we don’t have an instance of a Saga active on two JVMs at the same time. I think this would be a problem for micro-services, too.

*By kafka-enabled, I mean that our domain events are published out to Kafka and a kafka consumer pulls those events back in to Axon. When event durability and sequencing are required, such a design simplifies error handling when a broker fails during event publishing. The command will simply fail with no events having been published, rather than succeeding with events written to the Axon EventStore but not published.

Using the OrderManagementSaga example with Kafka, there could be topics for each of the Aggregates: Order, Shipment, Invoice, etc. and each of those topics would have multiple partitions, split out by key.

Let’s say that an instance of OrderManagementSaga needs to associateWith() events that are keyed by “foo-order” and “foo-shipment”.

When there is a single-server, the SagaManager can easily single-thread across the kafka partitions. With multiple servers and each server getting a “random” share of the kafka topic-partitions, there is a good chance each server is consuming from the wrong topics needed in order to keep processing single-threaded.

server “green”
consuming from Order topic partition 0 (key: foo-order)
consuming from Shipment topic partition 1 (key: bar-shipment)

server “blue”
consuming from Order topic partition 1 (key: bar-order)
consuming from Shipment topic partition 0 (key: foo-shipment)

Can anyone give some advice on how to handle this in Axon 2? Thanks for reading.

Steven_van_Beelen · May 25, 2021, 8:51am

Although it might be known, I want to remind you and your team that Axon 2 is, well, very old right now. If possible, I would recommend your team to update to a more recent version of Axon. Axon 4 has already been out for several years as well, meaning Axon 5 might come around the corner sooner or later. The framework introduced many new features that simplify your application, allowing you to stay focused on the business functionality instead of dealing with non-functional requirements.

Regardless, I’ll give it a go, hoping to provide some guidance.

how to make sure we don’t have an instance of a Saga active on two JVMs at the same time

It’s the fact that your team has heavily customized the solution, which will make it hard to find a fitting solution for the problem. Helping with vanilla Axon 2 is one thing, but diving through your event bus solution is, I assume, out of the scope of anybody on this forum.

Nonetheless, I’d wager you should have a look at Axon’s Cluster. It’s the Cluster that is in charge of invoking the Event Handling Components (e.g., Sagas) it has been given during registration. Thus in a blue/green scenario, both instances of an application will contain the same Cluster , with the same Saga instances in it. If during start-up of the second instance you make sure one of the Cluster has not started yet, you should be safe when it comes to dispersed event handling.

Granted, whether this solves the problem at hand does depend on the “Kafka-enabled”-specifics you’ve implemented. I cannot foresee whether the usages of Kafka (which isn’t an Axon specific, of course) have any further impact on this problem.

There is another point that strikes me in your question:

The command will simply fail with no events having been published, rather than succeeding with events written to the Axon EventStore but not published.

Reverting the command is what Axon Framework does as well in such a scenario. Hence, I am uncertain why you state this as a “Kafka feature,” but you could’ve achieved this by sticking with plain Axon.

Concluding, I thus work with the Cluster here to control whether event handling should start for an Event Handling Component / Saga. It will require some custom work on your end, as Axon 2 does not provide the easy API present in current framework versions to control the event handling process. I hope there’s some window of opportunity for you and your team to start a migration to Axon 4.

Nonetheless, I hope this helps you out @johnd!