Event message ordering when using RabbitMQ as transport

Hi,

I am very curious about your thoughts on event message ordering when using RabbitMQ as transport. I could also need some help in filling in some gaps and correcting faulty assumptions in my reasoning.

What I would like is for all event messages to arrive at the handlers in the same order as they were published (which I assume is the same order as they are stored in the event store). Since there is no well defined inter aggregate order, the order within each aggragate is of course good enough.

Three different scenarios as I see it.

  1. Normal delivery, messagages are taken from the queue in batches of size prefetchCount and acked in batches of size txSize. In this case the events should arrive at event handlers in the same order as they are stored in the event store right? If they are not, do you have any more or less wild guesses of what could be the issue?
  2. An exception occurrs in an EventHandler method. The current transaction (if there is one) is rolled back and the messages up to and including the failed message is “nacked” by Spring AMQP and put at the end of the queue (according to the Spring AMQP documentation). Or is all of the remaining, non comitted messages in the prefetched batch “nacked”? Anyway this is a real problem since you will put a lot of messages out of order when throwing an exception, which would require some logic on the receiving side trying to reconstruct the proper order. I realize that this is more of a Spring AMQP question but I assume that you have given this some thought : ).
  3. The application crashes with a partially processed batch of event messages. In this case I assume the messages are just not acked towards Rabbit and according to http://www.rabbitmq.com/semantics.html, they should be back in the queue in publication order.

If my assumption in 1. is true then we’re in pretty good shape. Now we only need to to get rid of Spring AMQPs bad habbit of putting messages on the back of queue “for performance reasons” or implement some fulfledged reordiring mechanism on the receiving side, which seems like a lot of hassle that all clients should not have to deal with.

So, please do share your collected knowledge and thoughts on the topic : )

Cheers
Sebastian

Hi Sebastian,

you’re lucky, I’ve been reading into this pretty recently for “the other Axon-based gaming platform” ;-). The short response is: looks like we’re in good shape.
The longer response:

  1. Yes, events are published through AMQP in the same order they are stored in the event store, provided that the entire batch of events is generated by the same Aggregate. An as you are a good CQRS-citizen, they are ;-).
  2. Yes, AMQP states that messages should be “returned to their originating queue”, but doesn’t specify in what order. There is no guarantee about ordering (see point 3). Spring uses “basic.reject” to reject failed messages. You can configure spring to tell it whether or not to requeue messages. Exceptions that have “AmqpRejectAndDontRequeueException” as a cause are never requeued.
  3. Correct again, Rabbit does guarantee message ordering by replacing failed messages at the head of the queue, even in a case of “basic.reject”. If you use a single consumer, you will automatically retry the messages until they pass. Beware of poison messages, though. Also, if prefetchSize > txSize, you could have a batch requeued, and continue processing on the next batch. This may also cause ordering to slightly change.

At one of my projects, I have done some “make it crash and get it back up” tests, and all seems working fine, even under pretty high load. I often have about 2000 robots playing a card game on 4 JVM’s on my laptop.

Note that recently, I added support for active/passive failover in AMQP Clusters. That means that 2 clusters that are listening to the same queue will, by default, effectively operate in active passive mode. The first cluster will read messages, the second will attempt to connect to the queue, but fail as long as the first is still connected. Obviously, there is a switch to turn this off.

Hope this helps.

Cheers,

Allard

Wow, that’s the best news I’ve had in a very long time : ).

Only one question remains, and that is one regarding the case that felt most obvious from the beginning. Why do we sometimes get messages out of order on the Rabbit Q?

The scenario is as follows: A request ends up at a SOAP-endpoint and a transaction is started. Stuff happens, among which three commands are dispatched to the command bus, each of which generates one event one that is published on the event bus, via SpringAMQPTerminal and a custom serializer on to a Rabbit exchange. We put some logging in the serializer to find out if the events where published in the correct order and they indeed are.

Now what happens now is that two separate listeners in their own clusters with their own queues receive the three events in a different order than the publication order (but the same different order). One of the listeners only logs the event message but the other throws an exception and puts the messages back on the on queue. Looking at it the rabbit management interface the messages are really out of order, so it’s not just the consumers that screwed up either.

Any guesses as to what could possibly have gone wrong here? We’re using the SimpleCommandBus so all dispatching is done synchronously. We’ve also tried starting separate transactions for each command, which seems to lower the frequency of the problem but not eliminating it.

Hi Sebastian,

I’ll rephrase to make sure I understand correctly: you send 3 commands consecutively on a SimpleCommandBus (no async at all). The events are transported via SpringAMQPTerminal to an exchange (topic or fanout, I suppose). Here the order still seems correct. The receiving end, however, receives the messages in a different order, even if they don’t throw any exceptions.

This is definitely weird. This would mean that Rabbit shuffles the message somewhere in between the Exchange and the Consumer. Could you bind a third queue to your exchange (with no listeners on it) and send the three messages again? Using the management console, you can read out the contents of the queue. Are the messages in the right order?

Reading about message ordering, I found this:
“messages published in one channel, passing through one exchange and one queue and one outgoing channel will be received in the same order that they were sent.” Using the CachingConnectionFactory, it is possible that the three messages are not sent using the same channel. But since your sending messages only after committing the previous one, it’s unlikely that this could cause the difference, but not impossible.

Is the problem reproducible? Does it occur all the time or once-per-…? Which RabbitMQ version do you use?

Cheers,

Allard

Hi Sebastian,

I hate this kind of stuff, so I have create a little test fixture to try to reproduce the problem, using the version 2.8.4 of Rabbit.
What I did is create 100 channels on the same connection and send a message (basicPublish) on each of the channels. The message contents were 0 through 99, to verify ordering. The messages were sent directly (via amq.direct exchange) to a temporary queue.
Results: Every now an then (roughtly 1-2%) of batches contain a message that is out of order
Then I changed the queue to a durable autodelete queue. Test failures still occur, but seem to be less. Could be a coincidence, though.

Finally, I used transactional channels. The results: no failures in 10 runs of 100 executions of sending 100 messages. That’s 100k messages, and not a single one out of order.

And here is the good news: transaction support is already implemented in the AMQPTerminal. In fact, it will even hook into the UnitOfWork to postpone the commit until the unit of work is commited. Just use setTransactional(true) on the terminal. There is a price though: throughput will go down a littlebit.

Cheers,

Allard

Amazing and of course, transactional=true solved the issue. The performance penalty with durable queues was quite massive though. Thinking about if you could live without durable queues and implement handling of redelivery in the listeners instead.

But that’s another issue… Thank you very much for the input!

Cheers
Sebastian

Using only 1 channel for publishing but turning of the transactions on that channel seems to do the trick as well, with better performance, depending on your number of durable queues.

Thanks for providing the support for setting up separate connection factories for the terminal and consumers btw : ).