I am struggling to get a handle on tracking event processor configuration and management. I am using the dockerized version of AxonServer (specifically version 4.1 from docker hub) and my project depends on version 4.1.1 of axonframework.
For the sake of an example, I will define a very simple bounded context named foo, as follows:
- com.myco.foo.command.FooAggregate – Command Handling AggregateRoot
- com.myco.foo.query.FooProjector – Query Model Projector
- com.myco.foo.query.Foo – Query Model
FooAggregate handles commands and emits events. FooProjector processes the events and updates the state of Foo. Eventually this query side view model will be consistent with the command side aggregate. This is all very easy to understand.
When I initially deployed my project I didn’t apply any custom event processing configuration and so I just got the defaults that are provided through spring boot auto configuration. As such, the DefaultListenerInvocationErrorHandler is org.axonframework.eventhandling.LoggingErrorHandler. In this mode, if an exception is raised in FooProjector it will be logged and the Event that triggered the error will be skipped (its state will not get projected into Foo). So, by default, when FooProjector has a bug in its processing logic or if FooAggregate has a bug in its event producing/emitting logic and either of those bugs are exposed by a given command execution, then the specific instance of Foo will never become consistent with FooAggregate. That is, unless the failure is detected, the offending bug is fixed, and the query model is re-built (as described in the documentation…essentially throwing away the old view and resetting the token store). This keeps everything moving, but it implies you must be monitoring the logs and alerting when failure is detected so that the manual process of fixing the issue can be handled. This could be a very legitimate approach, but it breaks down when/if your monitoring/alerting breaks down. However, with good testing you can drive the error rate down (close to zero hopefully) and then it’s probably manageable. Maybe this is the way to go, but I can’t decide.
Another approach I’ve tried is to configure the PropagatingErrorHandler. For this I’ve added the following configuration:
`
@Autowired
public void configure(EventProcessingConfigurer configurer) {
configurer.registerDefaultListenerInvocationErrorHandler(c -> PropagatingErrorHandler.INSTANCE);
}
`
With this configuration, when an exception is raised by an event listener, the tracking processor will go into an error mode where it will continue to retry processing the failing event until it succeeds; it will keep trying forever. There are a couple of problems with this configuration however. The first problem is that it’s very hard to troubleshoot the underlying issue because the error handler doesn’t log any details from the exception that triggered failure. All that comes across in the logs when an event handler fails with this configuration is the following:
`
2019-04-14 10:46:36.974 INFO [-,] 16714 — [xx.pkg.query]-0] o.a.e.TrackingEventProcessor : Fetched token: IndexTrackingToken{globalIndex=455} for segment: Segment[0/0]
2019-04-14 10:46:36.974 INFO [-,] 16714 — [xx.pkg.query]-0] o.a.a.c.event.axon.AxonServerEventStore : open stream: 456
2019-04-14 10:46:43.240 WARN [-,] 16714 — [xx.pkg.query]-0] o.a.e.TrackingEventProcessor : Releasing claim on token and preparing for retry in 4s
2019-04-14 10:46:43.241 INFO [-,] 16714 — [xx.pkg.query]-0] o.a.a.c.u.FlowControllingStreamObserver : Observer stopped
`
From this logging, the token 456 identifies the failing event. I can use it to identify the event in the event store, as every event in the store has a unique token, which appears to be a global sequence number. However, without information from the exception that was raised I don’t know any of the details of the failure. I can fix that with the following configuration:
`