Event tracking processor failing on mongodb (cosmos-db azure)

Dylan_Viljoen · June 4, 2018, 10:39am

Experiencing an issue where we moved from a mongodb running as a docker container to hosted azure mongodb (cosmos-db) Seems azure has limitations on size of query result (40Mb) https://docs.microsoft.com/en-us/azure/cosmos-db/faq (ErrorCode 16501)

2018-06-04 06:33:09.579 - WARN --- [ense-gateway]-0] o.a.e.TrackingEventProcessor Error occurred. Starting retry mode. [-]com.mongodb.MongoQueryException: Query failed with error code 16501 and error message 'Query exceeded the maximum allowed memory usage of 40 MB. Please consider adding more filters to reduce the query response size.' on server *********.azure.com:10255 at com.mongodb.operation.FindOperation$1.call(FindOperation.java:521) at com.mongodb.operation.FindOperation$1.call(FindOperation.java:510) at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:435) at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:408) at com.mongodb.operation.FindOperation.execute(FindOperation.java:510) at com.mongodb.operation.FindOperation.execute(FindOperation.java:81) at com.mongodb.Mongo.execute(Mongo.java:836) at com.mongodb.Mongo$2.execute(Mongo.java:823) at com.mongodb.OperationIterable.iterator(OperationIterable.java:47) at com.mongodb.FindIterableImpl.iterator(FindIterableImpl.java:151) at org.axonframework.mongo.eventsourcing.eventstore.AbstractMongoEventStorageStrategy.findTrackedEvents(AbstractMongoEventStorageStrategy.java:170) at org.axonframework.mongo.eventsourcing.eventstore.MongoEventStorageEngine.fetchTrackedEvents(MongoEventStorageEngine.java:202) at org.axonframework.eventsourcing.eventstore.BatchingEventStorageEngine.lambda$readEventData$1(BatchingEventStorageEngine.java:123) at org.axonframework.eventsourcing.eventstore.BatchingEventStorageEngine$EventStreamSpliterator.tryAdvance(BatchingEventStorageEngine.java:161) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300) at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at org.axonframework.eventsourcing.eventstore.EmbeddedEventStore$EventConsumer.peekPrivateStream(EmbeddedEventStore.java:380) at org.axonframework.eventsourcing.eventstore.EmbeddedEventStore$EventConsumer.peek(EmbeddedEventStore.java:341) at org.axonframework.eventsourcing.eventstore.EmbeddedEventStore$EventConsumer.hasNextAvailable(EmbeddedEventStore.java:318) at org.axonframework.messaging.MessageStream.hasNextAvailable(MessageStream.java:38) at org.axonframework.eventhandling.TrackingEventProcessor.checkSegmentCaughtUp(TrackingEventProcessor.java:294) at org.axonframework.eventhandling.TrackingEventProcessor.processBatch(TrackingEventProcessor.java:246) at org.axonframework.eventhandling.TrackingEventProcessor.processingLoop(TrackingEventProcessor.java:209) at org.axonframework.eventhandling.TrackingEventProcessor$TrackingSegmentWorker.run(TrackingEventProcessor.java:620) at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.run(TrackingEventProcessor.java:715) at org.axonframework.eventhandling.TrackingEventProcessor$CountingRunnable.run(TrackingEventProcessor.java:547) at java.base/java.lang.Thread.run(Thread.java:844)

The particular event tracking processor that is failing is a tracking processor that uses an in-memory tracking token to rebuild state from the beginning of event stream. As mentioned before, this was working fine running against dockerized mongodb. We could consider changing our internal implementation and using a persisted tracking token (which would be far from ideal), but this will more than likely still be a problem when we’d like to do event-replays from the start of event store.

Any suggestions?

Steven_van_Beelen · June 5, 2018, 12:30pm

Hi Dylan,

I’d guess this doesn’t have anything to do with using an InMemoryTokenStore or a MongoTokenStore, but rather with, like the exception suggests, the query being performed.

Thus, I’m guessing the batchSize, which is defaulted to 100, together with the size of your events, is the culprit here.

You might try changing your batch size to a smaller amount to see if that works.

Otherwise, isn’t it possible to configure cosmos-db to allow larger queries?

Unfamiliar with the specifics there, so hard pressed to give you more concrete solutions around cosmos-db.

Anyhow, a persisted TrackingToken apposed to a in memory TrackingToken shouldn’t make a difference.

FYI, as off Axon 3.2 you can leverage the ‘replay API’ on a TrackingEventProcessor.

I am assuming you’re using an in memory token to automatically replay on every start up, correct?

You could thus replace that (if you’d like) with a more direct approach by calling a replay on start up for example.

And another FYI, we’re working on introducing the possibility to initialize a tracking token of a given TrackingEventProcessor.

This issue describes that addition.

Hope this helps Dylan. If not, please ask some follow up questions!

Cheers,
Steven

Dylan_Viljoen · June 5, 2018, 12:58pm

Hi Stephen,

Thanks for the response. I’ve tried decreasing the batch size to 10 with same error. Also had a look at our event sizes - on avg. 65Kb, so simple math does not add up to > 40MB. You correct in your assumptions around InMemoryTrackingToken & I’m glad you highlighted that this should not be an issue. I’ll look into the replayToken ipv inMemory - thanks for the tip!

As mentioned we’ve been running against a dockerized mongo instance without any issues for a couple months now so I have a strong suspicion it might be cosmos-db related. I managed to reproduce the issue by taking axon completely out of the equation by using mongo-java-driver to simulate what axon does in AbstractMongoEventStorageStrategy.findTrackedEvents(…)

    @Test
   public void loadEvents() {
//    MongoClient mongoClient = new MongoClient(new MongoClientURI("mongodb://****.[azure.com:10255/axonframework?ssl=true&replicaSet=globaldb](http://azure.com:10255/axonframework?ssl=true&replicaSet=globaldb)"));
      MongoClient mongoClient = new MongoClient(new MongoClientURI("mongodb://localhost:27017/axonframework"));

      final MongoCollection<Document> eventCollection = mongoClient.getDatabase("axonframework").getCollection("domainevents");

      FindIterable<Document> cursor = eventCollection.find();
      cursor = cursor.sort(new BasicDBObject("timestamp", 1).append("sequenceNumber", 1));
      cursor = cursor.limit(100);

      final MongoCursor<Document> iterator = cursor.iterator();
      while(iterator.hasNext()) iterator.next();
   }

The culprit seems to be:

FindIterable<Document> cursor = eventCollection.find();

I came across the following article: https://scalegrid.io/blog/fast-paging-with-mongodb/ So approach 2. seems to be how axon does paging apart from limiting it to a batch size. I’ve given necessary info through to cosmos-db support for further feedback. Will keep you posted.

Above mentioned issues has once again raised concerns around how applicable mongodb is for a production eventstore??? - Perhaps some guidance on this topic from the community might be helpful to inform a decision to stick with mongo or move towards an eventstoreEngine more widely used in the community (mysql JdbcEventStore???) And yes, I’m aware of AxonDb which sounds great but at this stage of the project we’d like to get a bit of millage out of non-commercial solutions before considering a more long-term commercialised solution for our eventstore.

d

Steven_van_Beelen · June 5, 2018, 1:23pm

Hi Dylan,

Looking forward what you’re going to hear from cosmos-db’s support.
Interesting to hear what they come up with around this topic.

Regarding a production eventstore suggestion, when I’m implementing the framework at clients we typically suggest to use the JPA or JDBC implementation if they want a free version, and as of late AxonDb if you want to not have any worries in the event-storage space.

I’ve heard quite some scenarios where users had issues with using MongoDb in a production enviroment, sadly however.

Additionally, JPA and JDBC typically have great performance as well.

Hence why I’m suggestion to use either of those two i.o. the Mongo solution.

That’s my 2 cents to the situation.

Maybe somebody else from the community wants to share their experience around the topic of course.

Hope you figure out a solution to your problem soon enough Dylan!

Again, feel free to post follow up question.

Cheers,
Steven

Dylan_Viljoen · June 12, 2018, 7:37am

Some feedback for anyone interested… Been engaging with microsoft engineers. They managed to reproduce and gave me the following feedback:

…the sort is key and this is a bug on our side. We will investigate a fix and I will let you know – until then, you may have better results if you only sort on a single column, then do the second-column sort on the client.

So in essence this is not an axon bug but cosmos-db related to how multi-key sorting works.