Axon 4.0 / AxonServer EventHandler stop working

Hello

We are observing that our eventhandlers stop working randomly after we migrated to AxonServer and Axon 4.0. It can happen already with the first event or after x number of events.
The tracking tokens were migrated as well.

There are no exceptions thrown from our eventhandlers and also no exceptions thrown from AxonServer. (Currently running with log level DEBUG hoping to find something but no luck)
The tracking processors are all caught up.

The current configuration for all tracking processor is:

  • SequentialPerAggregate
  • threadCount: 2
  • segments: 4

Command dispatching works and the resulting events are stored successfully but no eventhandler is catching them. What helps is restarting the event-processor.

It’s unclear to us where the error is located wrong configuration from our side, AxonFramework or in AxonServer?

We observed one “weird” thing. In the overview of AxonServer we saw 3 instances of an application running although only 2 were. Inside the application details the information displayed was correct.

Kind regards,
Jose

Quick update: What we observed is that when we restart the AxonServer all of the event handlers stop working

Hi Jose. This sounds like a pretty standard scenario. The client should automatically reconnect.
What does your setup/configuration look like?

Cheers,

Allard

Hi Allard

We indeed see in the logs that the clients re-connect. The problem affects just the event handlers and we never stop them manually. AxonServer as well does not report any stopped event-handlers.
We were able to reproduce it after deleting a few aggregates (simply issuing a DeleteXXX command and removing that entity from the database). It often happened after the first or second time. Further DeleteCommands went through but they the event-handlers never catched them. Then we restarted the particular event-processor in AxonServer and it caught up again.

The good (or bad) news is that for some reason we are not able to reproduce it today although the code has not changed since yesterday.

Our configuration is the following:

@Configuration
open class AxonBaseConfiguration {

    @Value("\${axon.mongo.event.db:events}")
    lateinit var mongoEventDbName: String

    @Value("\${axon.threadCount:2}")
    var threadCount: Int = 2

    @Value("\${axon.segmentCount:4}")
    var segmentCount: Int = 4

    @Autowired
    lateinit var mongoClient: MongoClient

    @Qualifier("eventSerializer")
    @Bean
    fun eventSerializer(): Serializer {
        return JacksonSerializer.builder().objectMapper(ObjectMapper()
                .registerModule(KotlinModule())
                .disable(DeserializationFeature.FAIL_ON_IGNORED_PROPERTIES)
                .disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
        ).build()
    }

    @Bean
    fun eventScheduler(eventBus: EventBus) =
            SimpleEventScheduler.builder()
                    .scheduledExecutorService(Executors.newScheduledThreadPool(4))
                    .eventBus(eventBus).build()

    @Autowired
    fun configure(configurer: Configurer, bugsnag: Bugsnag) {
        configurer
                .eventProcessing { epc ->
                    epc.registerDefaultSequencingPolicy { SequentialPerAggregatePolicy() }
                            .registerDefaultListenerInvocationErrorHandler { c -> BugsnagErrorHandler(bugsnag) }
                }
                .registerComponent(TrackingEventProcessorConfiguration::class.java) { c ->
                    TrackingEventProcessorConfiguration
                            .forParallelProcessing(threadCount)
                            .andInitialSegmentsCount(segmentCount)
                }
    }

    @Bean
    fun tokenStore(serializer: Serializer): TokenStore {
        return MongoTokenStore.builder()
                .mongoTemplate(axonMongoTemplate())
                .serializer(serializer)
                .build()
    }

    @Bean
    fun axonMongoTemplate(): DefaultMongoTemplate {
        return DefaultMongoTemplate.builder()
                .mongoDatabase(mongoClient, mongoEventDbName)
                .build()
    }

    @Bean
    fun eventStore(axonServerConfiguration: AxonServerConfiguration, axonServerConnectionManager: AxonServerConnectionManager): EventStore {
        axonServerConfiguration.nrOfNewPermits = 1000
        axonServerConfiguration.initialNrOfPermits = 2000

        return AxonServerEventStore.builder()
                .configuration(axonServerConfiguration)
                .eventSerializer(eventSerializer())
                .platformConnectionManager(axonServerConnectionManager)
                .upcasterChain(EventUpcasterChain(
                        ExerciseStartedUpcaster(),
                        SpotReservedEventUpcaster(),
                        SpotReleasedEventUpcaster(),
                        SpotUnableToReserveEventUpcaster()

                ))
                .build()
    }

    @Bean
    fun sagaRepository(serializer: Serializer): SagaStore<Any> {
        return MongoSagaStore.builder()
                .mongoTemplate(axonMongoTemplate())
                .serializer(serializer)
                .build()
    }

    @Bean
    fun exerciseStartedWithVariationUpcaster() = ExerciseStartedUpcaster()

    companion object {
        private val log = LoggerFactory.getLogger(AxonBaseConfiguration::class.java)
    }
}

I just wanted to give an update on this issue:

I just created a very simple example application: https://github.com/locorider/axon-4.0-test

What I did:

  • Started the application with SequencingPolicy = SequentialPerAggregate, threadCount = 4, segments = 4 (I soon realized it doesn’t make sense but I took the configuration we have in production)
  • Had some command / events dispatched -> Everything goes fine
  • Stopped Axon-Server (application still running)
  • Restarted Axon-Server
    Following appeared in the log:

`

2018-11-02 15:06:30.219 WARN 63373 — [pool-5-thread-1] o.a.a.c.command.AxonServerCommandBus : There was a problem dispatching a command GenericCommandMessage{payload={CreateTodo(todoId=109c5d72-1033-4516-bcd6-64a63449fda8, name=3784cf87-e6c1-45c7-8108-8a23327ea32e)}, metadata={}, messageIdentifier=‘4a1adf7c-aded-4996-8b34-b8b9d70d9f03’, commandName=‘com.kinastic.axon.model.CreateTodo’}.

org.axonframework.axonserver.connector.AxonServerException: No connection to AxonServer available
at org.axonframework.axonserver.connector.AxonServerConnectionManager.checkConnectionState(AxonServerConnectionManager.java:129) ~[axon-server-connector-4.0.jar:4.0]
at org.axonframework.axonserver.connector.AxonServerConnectionManager.getChannel(AxonServerConnectionManager.java:75) ~[axon-server-connector-4.0.jar:4.0]
at org.axonframework.axonserver.connector.command.AxonServerCommandBus.dispatch(AxonServerCommandBus.java:111) ~[axon-server-connector-4.0.jar:4.0]
at org.axonframework.commandhandling.gateway.AbstractCommandGateway.send(AbstractCommandGateway.java:75) [axon-messaging-4.0.jar:4.0]
at org.axonframework.commandhandling.gateway.DefaultCommandGateway.send(DefaultCommandGateway.java:78) [axon-messaging-4.0.jar:4.0]
at org.axonframework.commandhandling.gateway.DefaultCommandGateway.send(DefaultCommandGateway.java:134) [axon-messaging-4.0.jar:4.0]
at com.kinastic.axon.Application.emit(Application.kt:29) [classes/:na]
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_111]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_111]
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65) [spring-context-5.0.6.RELEASE.jar:5.0.6.RELEASE]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.0.6.RELEASE.jar:5.0.6.RELEASE]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
at java.util.concurrent.FutureTask.runAndReset$$$capture(FutureTask.java:308) [na:1.8.0_111]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java) [na:1.8.0_111]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_111]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]

2018-11-02 15:06:30.220 WARN 63373 — [pool-5-thread-1] o.a.c.gateway.DefaultCommandGateway : Command ‘com.kinastic.axon.model.CreateTodo’ resulted in org.axonframework.axonserver.connector.command.AxonServerCommandDispatchException(No connection to AxonServer available)
2018-11-02 15:06:30.626 INFO 63373 — [rverConnector-0] o.a.a.c.AxonServerConnectionManager : Connecting using unencrypted connection…
2018-11-02 15:06:30.947 INFO 63373 — [rverConnector-0] o.a.a.c.AxonServerConnectionManager : Re-subscribing commands and queries
2018-11-02 15:06:30.948 INFO 63373 — [rverConnector-0] o.a.a.c.command.AxonServerCommandBus : Creating new subscriber

`

  • Command and events go again successfully through (visible in the AxonServer UI) BUT the query-model was not updated at all therefore the event-handler didn’t get anything

Screenshot 2018-11-02 at 15.21.33.png

AxonServer also reports a running processor (I refreshed the page to be on the safe side)

  • Restarted the application query-model gets updated how it should. I repeated the steps till this point and restarted the Processor in AxonServer which results in the same. Successful update of the query-model.

Logs from AxonServer after a restart (and waited a few minutes)
Screenshot 2018-11-02 at 15.25.22.png

I also tried with deleting the controldb to see if that changes anything but the result is the same. It gets “stuck”.

What I furthermore tried is a bunch of different configurations:

  • SequentialPolicy
  • singleThreadedProcessing
  • no configuration at all (default axon configuration)

Nothing helped. The event-handler stops working everytime.

Sidenote: For every different configuration or step I deleted all trackingtokens, query models, events and controldb

Please tell me if you need further information or something I can help you with.

Kind regards
Jose

Hi Jose,

it looks like the subscription to the event stream isn’t recovering properly. We’ll look into it.
Cheers,

Allard

Screenshot 2018-11-02 at 15.21.33.png

Screenshot 2018-11-02 at 15.25.22.png

Hi Allard

Thanks a lot!

In the meantime we have the following solution:

`

@Autowired
fun restart(axonServerConnectionManager: AxonServerConnectionManager, eventProcessingModule: EventProcessingModule) {
    axonServerConnectionManager.addDisconnectListener {
        eventProcessingModule.eventProcessors().forEach {
            try {
                log.info("Disconnect detected... Shutting down ${it.key}");
                it.value.shutDown();
            } catch (e: Exception) {

            }
        }
    }

    axonServerConnectionManager.addReconnectListener {
        eventProcessingModule.eventProcessors().forEach {
            try {
                log.info("Reconnected! Restarting ${it.key}")
                shutdownSilently(it.value)
                it.value.start()
            } catch (e: Exception) {
                log.error("Unable to restart ${it.key}")
                log.error(e.message, e)
            }
        }
    }
}

`

Have a nice weekend!

All the best
Jose

Hey folks,

I’ve just opened an issue for this: https://github.com/AxonFramework/AxonFramework/issues/883.

Cheers,
Milan