I am using axon-tracing library for Open Telemetry.
We also use Grafana Tempo for tracing and distill Prometheus metrics from traces with standard services in Grafana Tempo.
(note: Mimir is an alternative for Prometheus, but we use Prometheus)
The root problem is that every span generated by axon-tracing lib has a NAME that is appended with aggregate uuid, seq nb, etc.
This generates millions and millions of separate unique span names. But this is in essence not needed, because there are span LABELS/attributes that contain the same data (aggregate id etc).
On metrics level every span is regarded as a complete new metric category, or service edge, or …
So, instead of measuring, for example, frequency of Saga ‘AssignConnectScheduleSaga’ -invocations, Prometheus needs to measure the frequency of Saga ‘AssignConnectScheduleSaga’ ab345334634335dfdfd34536. This is of course pointless as there are millions of UUIDs (one per invocation of the saga).
280 million different span names…
This should be maybe a few hundred or so.
This also starts eating huge amount of resources, and internal queue to metrics-generator starts choking.
Hello @Christian_Bonami, thanks for the feedback! I will see if I can change the implementation to use tags instead of names. That sounds very feasible.
Which Axon Framework version are you on?
Alright, thank you! Unfortunately, this will take some time as a lot of people are on holiday (myself soon te be included).
In the meantime, I would like to give you the power to solve your predicament until we have a better implementation. You can use the following configuration in your application, for now, if you would like to:
@Configuration
public class SpanFactoryConfiguration {
@Bean
public SpanFactory spanFactory() {
OpenTelemetrySpanFactory original = OpenTelemetrySpanFactory.builder().build();
return new SpanFactory() {
@Override
public Span createRootTrace(Supplier<String> operationNameSupplier) {
return original.createRootTrace(operationNameSupplier);
}
@Override
public Span createHandlerSpan(Supplier<String> operationNameSupplier, Message<?> parentMessage,
boolean isChildTrace, Message<?>... linkedParents) {
if(parentMessage instanceof SubscriptionQueryUpdateMessage<?>) {
return NoOpSpanFactory.INSTANCE.createHandlerSpan(operationNameSupplier, parentMessage, isChildTrace, linkedParents);
}
return original.createHandlerSpan(operationNameSupplier, parentMessage, isChildTrace, linkedParents);
}
@Override
public Span createDispatchSpan(Supplier<String> operationNameSupplier, Message<?> parentMessage,
Message<?>... linkedSiblings) {
return original.createDispatchSpan(operationNameSupplier, parentMessage, linkedSiblings);
}
@Override
public Span createInternalSpan(Supplier<String> operationNameSupplier) {
String name = operationNameSupplier.get();
if(name.startsWith("EventSourcingRepository.load")) {
name = "EventSourcingRepository.load";
}
String finalName = name
return original.createInternalSpan(() -> finalName);
}
@Override
public Span createInternalSpan(Supplier<String> operationNameSupplier, Message<?> message) {
return original.createInternalSpan(operationNameSupplier, message);
}
@Override
public void registerSpanAttributeProvider(SpanAttributesProvider provider) {
original.registerSpanAttributeProvider(provider);
}
@Override
public <M extends Message<?>> M propagateContext(M message) {
return original.propagateContext(message);
}
};
}
}
Again, thanks for your feedback.
EDIT: There was a small bug in the code. I fixed it.
I want to add one more thing, the OpenTelemetry implementation of that version is a bit bugged due to thread leaks. Would you mind upgrading to 4.6.7? The fixes are included in there.
The BOM version is disjointed from the Axon Framework version since they have a different release cycle. An update of an extension, for example, might bump the BOM version but not the framework version.
Glad to see you could enhance the SpanFactory in a way that works for you. In one of the next updates I will remove the unique identifiers from the span names