Hello,
We are using Axon Framework 4.10.1
Our Application contains 3 saga processors which are lagging behind by almost 110 Million events. The latency metric shows that it’s almost 6 days behind the current timestamp.
I have tried to increase the batching size and played around with configuration properties of a pooled streaming event processors, spliting and merging the segments, but nothing seems to reduce the lag.
The application is running on a kubernetes cluster with 20 replicas.
The configurations for the Sagas are as follows.
`Function<String, ScheduledExecutorService> coordinatorExecutorBuilder =
name → Executors.newScheduledThreadPool(
1,
Thread.ofVirtual().name("[PSP] Coordinator - " + name, 0).factory()
);
Function<String, ScheduledExecutorService> workerExecutorBuilderExtended =
name → Executors.newScheduledThreadPool(
100,
Thread.ofVirtual().name("[PSP] Worker extended - " + name, 0).factory()
);`
Saga #1
EventProcessingConfigurer.PooledStreamingProcessorConfiguration pspConfigExtended =
(config, builder) -> builder
.coordinatorExecutor(coordinatorExecutorBuilder)
.workerExecutor(workerExecutorBuilderExtended)
.initialSegmentCount(2)
.batchSize(100)
.tokenClaimInterval(10000)
.claimExtensionThreshold(15000)
.enableCoordinatorClaimExtension();
Saga #2
EventProcessingConfigurer.PooledStreamingProcessorConfiguration pspConfigForDemandSaga =
(config, builder) -> builder
.coordinatorExecutor(coordinatorExecutorBuilder)
.workerExecutor(workerExecutorBuilderExtended)
.initialSegmentCount(2)
.batchSize(1000)
.tokenClaimInterval(10000)
.claimExtensionThreshold(15000)
//.maxClaimedSegments(16) // enable if needed
.enableCoordinatorClaimExtension();
Saga #3
EventProcessingConfigurer.PooledStreamingProcessorConfiguration pspConfigForRecordSaga =
(config, builder) -> builder
.coordinatorExecutor(coordinatorExecutorBuilder)
.workerExecutor(workerExecutorBuilderExtended)
.initialSegmentCount(2)
.batchSize(500)
.tokenClaimInterval(10000)
.claimExtensionThreshold(15000)
.enableCoordinatorClaimExtension();
We are using Postgres as our event store, the system metrics and query metrics on the DB show that DB isn’t the problem.
I tried to check if my event handling components in the Saga need any optimisations, but most of the event handling logic is trivial with only two event handlers that take ~ 600ms in handling those events.
One optimisation that I could use if to use CommandGateway.send instead of sendAndWait.
Can someone from the Axon team help, If I can try something else?