Hi,
We’re seeing a strange problem when running an application using Axon 1.3.2 to process a high volume of messages in our clustered environment. The error is as follows:
org.axonframework.repository.ConcurrencyException, Concurrent modification detected for Aggregate identifier [188bd324-e445-416c-a301-a9e01e2d2c9f], sequence: [5563]
Which I believe is saying that we are trying to write two events to our event store with the same aggregate identifier and sequence number. However, we aren’t setting the sequence number ourselves and are letting the Axon framework handle that for us.
Our event sourcing repository is defined as:
<axon:event-sourcing-repository id="companyRepository" aggregate-type="net.xxx.xxx.Company" event-bus="eventBus" event-store="companyEventStore" > <axon:snapshotter-trigger id="companySnapshotterTrigger" event-count-threshold="20" snapshotter-ref="userSnapshotter" /> </axon:event-sourcing-repository>
and the event store as:
<axon:jpa-event-store id="companyEventStore" event-serializer="eventStoreSerializer" data-source="dataSource"/
Below is an extract of our logs which shows the problem happening:
2015-07-23 08:21:04,469 INFO [quartzSchedulerStaticPollers_Worker-7] user.services.UserManagementService: Dispatching AddImportedUserCommand for user assill1175208771@everyma1l.us 2015-07-23 08:21:04,470 INFO [quartzSchedulerStaticPollers_Worker-7] command.handler.UserAdminCommandsHandler: Handling AddImportedUserCommand for user {}6d6c99ac-9dec-4df0-8f24-5ba5bfd2a17f 2015-07-23 08:21:04,471 INFO [quartzSchedulerStaticPollers_Worker-7] domain.user.User: Dispatching UserCreatedEvent for assill1175208771@everyma1l.us 2015-07-23 08:21:04,471 INFO [quartzSchedulerStaticPollers_Worker-7] domain.user.RmmUser: Called User domain super constructor for user loginId assill1175208771@everyma1l.us 2015-07-23 08:21:04,512 INFO [quartzSchedulerStaticPollers_Worker-7] user.services.UserManagementService: Dispatching AddImportedUserCommand for user hmorrison1626001654@ma1l2u.com 2015-07-23 08:21:04,514 INFO [quartzSchedulerStaticPollers_Worker-7] domain.user.User: Dispatching UserCreatedEvent for hmorrison1626001654@ma1l2u.com 2015-07-23 08:21:04,514 INFO [quartzSchedulerStaticPollers_Worker-7] domain.user.RmmUser: Called User domain super constructor for user loginId hmorrison1626001654@ma1l2u.com 2015-07-23 08:21:04,528 ERROR [quartzSchedulerStaticPollers_Worker-7] hibernate.util.JDBCExceptionReporter: Duplicate entry '188bd324-e445-416c-a301-a9e01e2d2c9f-5563-RmmCompany' for key 'aggregateIdentifier' ..... org.axonframework.repository.ConcurrencyException, Concurrent modification detected for Aggregate identifier [188bd324-e445-416c-a301-a9e01e2d2c9f], sequence: [5563]
The lines highlighted in green show the successful processing where we have a user created in our system and the bits in yellow highlight an error case where the required user is not created.
When I query our DomainEventEntry table for the aggregateIdentifier 188bd324-e445-416c-a301-a9e01e2d2c9f and sequence number 5563, I can see an event which relates to the user that was processed directly before the error case (i.e. the user highlighted in green).
So, it looks to me like the sequence number generation isn’t working correctly is generating the same sequence number twice for two events which are dispatched against the same repository in quick succession. However, looking at the code which generates the sequence number, I can’t see how this is case considering my understanding is that all of the events are placed on the event bus synchronously by one thread. Therefore it shouldn’t be possible for the second event to be applied before the processing for applying the first one has completed. This is code from the Axon Framework which generates the sequence number:
private long newSequenceNumber() { Long currentSequenceNumber = getLastSequenceNumber(); if (currentSequenceNumber == null) { return 0; } return currentSequenceNumber + 1; }
What could be the reason for this behaviour? As pointed out above, we only see this when running a high volume of traffic through our clustered environment and although the environment is clustered, all of the activity is happening on one node, so that takes out the possibility of multiple nodes accessing repositories, etc. Additionally, it doesn’t happen for all commands we dispatch. E.g. we dispatch 30000 commands to create 30000 users and between 1-10 users will fail with this problem (the number differs every time we run our tests) and the rest will work as expected.
Thanks in advance for your help.
Chris