Axon Tracing Otel: too many unique span ids blows up Tracing related metrics

I am using axon-tracing library for Open Telemetry.
We also use Grafana Tempo for tracing and distill Prometheus metrics from traces with standard services in Grafana Tempo.

(note: Mimir is an alternative for Prometheus, but we use Prometheus)

The root problem is that every span generated by axon-tracing lib has a NAME that is appended with aggregate uuid, seq nb, etc.
This generates millions and millions of separate unique span names. But this is in essence not needed, because there are span LABELS/attributes that contain the same data (aggregate id etc).
On metrics level every span is regarded as a complete new metric category, or service edge, or …

So, instead of measuring, for example, frequency of Saga ‚ÄėAssignConnectScheduleSaga‚Äô -invocations, Prometheus needs to measure the frequency of Saga ‚ÄėAssignConnectScheduleSaga‚Äô ab345334634335dfdfd34536. This is of course pointless as there are millions of UUIDs (one per invocation of the saga).


This leads to ‚Äėblowing up‚Äô the metrics generator queueus, and makes metrics useless.

280 million different span names…
This should be maybe a few hundred or so.
This also starts eating huge amount of resources, and internal queue to metrics-generator starts choking.

Hello @Christian_Bonami, thanks for the feedback! I will see if I can change the implementation to use tags instead of names. That sounds very feasible.
Which Axon Framework version are you on?

1 Like

<axon.version>4.6.2</axon.version>

Alright, thank you! Unfortunately, this will take some time as a lot of people are on holiday (myself soon te be included).
In the meantime, I would like to give you the power to solve your predicament until we have a better implementation. You can use the following configuration in your application, for now, if you would like to:

@Configuration
public class SpanFactoryConfiguration {

    @Bean
    public SpanFactory spanFactory() {
        OpenTelemetrySpanFactory original = OpenTelemetrySpanFactory.builder().build();
        return new SpanFactory() {
            @Override
            public Span createRootTrace(Supplier<String> operationNameSupplier) {
                return original.createRootTrace(operationNameSupplier);
            }

            @Override
            public Span createHandlerSpan(Supplier<String> operationNameSupplier, Message<?> parentMessage,
                                          boolean isChildTrace, Message<?>... linkedParents) {
                if(parentMessage instanceof SubscriptionQueryUpdateMessage<?>) {
                    return NoOpSpanFactory.INSTANCE.createHandlerSpan(operationNameSupplier, parentMessage, isChildTrace, linkedParents);
                }
                return original.createHandlerSpan(operationNameSupplier, parentMessage, isChildTrace, linkedParents);
            }

            @Override
            public Span createDispatchSpan(Supplier<String> operationNameSupplier, Message<?> parentMessage,
                                           Message<?>... linkedSiblings) {
                return original.createDispatchSpan(operationNameSupplier, parentMessage, linkedSiblings);
            }

            @Override
            public Span createInternalSpan(Supplier<String> operationNameSupplier) {
                String name = operationNameSupplier.get();
                if(name.startsWith("EventSourcingRepository.load")) {
                    name = "EventSourcingRepository.load";
                }
                String finalName = name
                return original.createInternalSpan(() -> finalName);
            }

            @Override
            public Span createInternalSpan(Supplier<String> operationNameSupplier, Message<?> message) {
                return original.createInternalSpan(operationNameSupplier, message);
            }

            @Override
            public void registerSpanAttributeProvider(SpanAttributesProvider provider) {
                original.registerSpanAttributeProvider(provider);
            }

            @Override
            public <M extends Message<?>> M propagateContext(M message) {
                return original.propagateContext(message);
            }
        };
    }
}

Again, thanks for your feedback.

EDIT: There was a small bug in the code. I fixed it.

1 Like

I want to add one more thing, the OpenTelemetry implementation of that version is a bit bugged due to thread leaks. Would you mind upgrading to 4.6.7? The fixes are included in there.

Ok, will do so. Thank you for the heads up.

Btw, I made this:

@Slf4j
@Configuration
public class SpanFactoryConfiguration {
    final static Pattern sagaNameSelectorPattern = Pattern.compile("SagaManager<(.*?)>.invokeSaga");

    @Bean
    public SpanFactory spanFactory() {
        OpenTelemetrySpanFactory original = OpenTelemetrySpanFactory.builder().build();
        return new SpanFactory() {
            @Override
            public Span createRootTrace(Supplier<String> operationNameSupplier) {
                return original.createRootTrace(operationNameSupplier);
            }

            @Override
            public Span createHandlerSpan(Supplier<String> operationNameSupplier, Message<?> parentMessage,
                                          boolean isChildTrace, Message<?>... linkedParents) {
                if (parentMessage instanceof SubscriptionQueryUpdateMessage<?>) {
                    return NoOpSpanFactory.INSTANCE.createHandlerSpan(operationNameSupplier, parentMessage, isChildTrace, linkedParents);
                }
                return original.createHandlerSpan(operationNameSupplier, parentMessage, isChildTrace, linkedParents);
            }

            @Override
            public Span createDispatchSpan(Supplier<String> operationNameSupplier, Message<?> parentMessage,
                                           Message<?>... linkedSiblings) {
                return original.createDispatchSpan(operationNameSupplier, parentMessage, linkedSiblings);
            }


            @Override
            public Span createInternalSpan(Supplier<String> operationNameSupplier) {
                String name = operationNameSupplier.get();
                log.trace("Creating span for operation: {}", name);
                if (name.startsWith("EventSourcingRepository.load")) {
                    name = "EventSourcingRepository.load";
                } else if (name.startsWith("AxonFramework-Events.event")){
                    name = "AxonFramework-Events.event";
                } else {
                    Matcher matcher = sagaNameSelectorPattern.matcher(name);
                    if (matcher.find()) {
                        name = "SagaManager<" + matcher.group(1) + ">.invokeSaga";
                    }
                }
                String finalName = name;
                return original.createInternalSpan(() -> finalName);
            }

            @Override
            public Span createInternalSpan(Supplier<String> operationNameSupplier, Message<?> message) {
                return original.createInternalSpan(operationNameSupplier, message);
            }

            @Override
            public void registerSpanAttributeProvider(SpanAttributesProvider provider) {
                original.registerSpanAttributeProvider(provider);
            }

            @Override
            public <M extends Message<?>> M propagateContext(M message) {
                return original.propagateContext(message);
            }
        };
    }
}

I hope the createInternalSpan method covers all cases where unique span names (i.e. appended with aggregate id etc) are created.

There is no axon-bom for 4.6.7 :frowning:
The last one is 4.6.6.

And… the 4.6.6 bom points to axon 4.6.7. Looks like an error to me.

The BOM version is disjointed from the Axon Framework version since they have a different release cycle. An update of an extension, for example, might bump the BOM version but not the framework version.
Glad to see you could enhance the SpanFactory in a way that works for you. In one of the next updates I will remove the unique identifiers from the span names

The unique span id’s have another effect: Huge ES indices for Jeager. Thanks for sharing your solution so I can use it as well :slight_smile:

I have decided to create a Github issue and start development into improvements. You can watch Improve Spanfactory configurability · Issue #2780 · AxonFramework/AxonFramework (github.com) fort more updates.