SpanMetricsProcessor for agent instrumentation #1018

anuragagarwal561994 · 2022-08-14T14:57:23Z

Is your feature request related to a problem? Please describe.
Opentelmetry java instrumentation is nicely able to handle the application traces using annotations, a nice feature to have would be to record metrics of the spans using the annotations itself. The metrics would be of the way like span metrics processor does to record the R.E.D metrics.

Describe the solution you'd like
We can use the current span annotations to make a histogram.
Histogram itself will give the counter metrics.
We can define two fields in the annotation, one to disable metrics for a particular method and another to add a gauge for the corresponding method.
The whole feature can be disabled by default and can be enabled on demand by using a configuration option.
Attributes of a span can be used as labels and exceptions can be used as the corresponding error metrics (can be thought more on the same).
These metrics can be exposed using the prometheus metrics exporter.

Describe alternatives you've considered
We have currently been using the span metrics processor at the otel collector end for the same purpose where the traces are being pushed and the metrics are consolidated using the processor.

But these metrics are being impacted by the trace sampling set on client. If we make tracing 100% then it can impact performance. Also most of the times the consolidation can easily be handled at the client end easily and further consolidation can take place at prometheus, like most of the applications run today.

Even if we are using a dummy Span (in case of sampled trace), I believe we can record metrics (I may be wrong here), thus allowing us to not push traces 100% of the times yet getting accurate metrics data.

This would further enhance user experience and allow them to record application metrics without much hassel.

Additional Context
A lot of open source tools are adopting span metrics processor for showing R.E.D metrics along side their traces few examples include

Grafana
Signoz
Jaegar

CoderPoet · 2022-08-15T13:47:14Z

sounds great

anuragagarwal561994 · 2022-08-15T16:24:33Z

I can contribute for the same, would need some guidance and proposal approval.

trask · 2022-08-15T20:55:27Z

hi all, check out related discussion at open-telemetry/opentelemetry-java#4260

anuragagarwal561994 · 2022-08-16T00:39:56Z

Hello again @trask, so I checked the discussion you referred, let me summarise the above discussion for others to follow, do let me know if I missed anything:

Metrics Annotations opentelemetry-java-instrumentation#7030 this issue is same requirement but slightly different approach
A PR was raised Add Metrics Annotations opentelemetry-java#4260 to add the extra annotations in the opentelemetry-java repo
However the annotations at that time were being moved to another repo https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation-annotations
The PR was closed because it did no longer belong in the opentelemetry-java repo Add Metrics Annotations opentelemetry-java#4260 (comment)

So if we can finalise what proposal works best for this feature, I suspect it to be a mix of both the approaches suggested by me and @fstab, this will give users more granular control. Both @fstab and I can contribute to this change.

trask · 2022-08-16T01:18:51Z

@anuragagarwal561994 yes I agree with your summary 👍

anuragagarwal561994 · 2022-08-16T17:59:08Z

@trask what do you think should be the right approach here. First let's list down the requirements:

Behaviour Requirements:

Must Haves

We need to record metrics from the opentelemetry-java-instrumentation
User should be allowed to choose which methods to monitor
User to should allowed to choose to disable the functionality
These metrics should be exposed along with the open telemetry metrics exporter
Metrics should have span attributes as labels
Metrics control should be independent of tracing configuration / sampling at the agent's end
We should have metrics such as counter (call counts), gauge (concurrency), histogram (latency mainly)
Pod / Instance classification
Service classification
Metric name definition

Good to Have

User should be allowed to configure further granularities (this may only be applicable for histogram)
Histograms can further be configured to record on return values (but not sure how this can be simplified like request / response size)
User should be allowed to add a common attribute to these requests
Errors / Exceptions should be recorded as separate metrics

Let me know if I missed something or something else can be included here.

Approaches:

There are currently 2 approaches for recording metrics.

Single Annotation Approach
We can use @WithSpan annotation and use the same to record histogram. Counter can be exposed separately or can be used from _count metric of histogram. Gauge can define the concurrency of the method.

Functionality to disable this feature can be defined in otel.properties.

Feature will be disabled by default.

Enabling it will start recording their metrics everywhere.

Further control can be given to the user using another annotation like @WithMetrics kind of annotation where user can enable / disable the metrics and further define finer grain controlled. If this annotation is missing and the feature is enabled we can assume that the user wants to record everything with the defaults.

Multiple Annotation Approach
We can define explicitly different annotations like @Gauge, @Meter, @Counter, @WithMetrics. The last annotation is basically combination of the first three annotations with their defaults.

This is more or less as suggested by @fstab.

Again the feature control via otel.properties will remain same.

This approach will only record metrics where the user intends to.

We can further even try to find out a middle ground between the two approaches, making the life of users simpler.

Pod / Instance & Service classification can be handled by prometheus exporter I guess and should not be the responsibility of this feature.

Metric name definition can be done in the following ways:

We can either use span names by default and append _counter, _gauge or whatever the standards say
With annotations we can allow the user to explicitly define the metric names, although this would prefer the second approach as we have different annotations and it will be easier for the user to declare
We can use the method name, which I believe is anyways the name of the span so it might coincide with the first point

@trask let me know how this all sounds, what can we further think of and which is the preferred way the community prefers.

anuragagarwal561994 · 2022-08-18T05:26:32Z

@trask did you get a chance to go through the above?

michaelzhd · 2023-03-08T23:51:54Z

I am wondering why this isn't proposed earlier. Check what Datadog did with their Trace Metrics and how awesome it is. By the way, I agree that it would be nice if users are able to choose which methods to be monitored, but by default it should be all (enabled) or none(disabled). When enabled, users should be allowed to disable certain metrics (for example, health check metrics). Datadog Trace Metrics should provide a great example.

trask · 2023-08-24T04:37:42Z

hey @anuragagarwal561994, I think this SpanMetricsProcessors would make a good contribution to the contrib repo, i'm going to transfer it over there. It can then be used as an agent extension (or outside of the Java agent) as needed.

trask transferred this issue from open-telemetry/opentelemetry-java-instrumentation Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpanMetricsProcessor for agent instrumentation #1018

SpanMetricsProcessor for agent instrumentation #1018

anuragagarwal561994 commented Aug 14, 2022 •

edited

Loading

CoderPoet commented Aug 15, 2022

anuragagarwal561994 commented Aug 15, 2022

trask commented Aug 15, 2022

anuragagarwal561994 commented Aug 16, 2022 •

edited

Loading

trask commented Aug 16, 2022

anuragagarwal561994 commented Aug 16, 2022

anuragagarwal561994 commented Aug 18, 2022

michaelzhd commented Mar 8, 2023 •

edited

Loading

trask commented Aug 24, 2023

SpanMetricsProcessor for agent instrumentation #1018

SpanMetricsProcessor for agent instrumentation #1018

Comments

anuragagarwal561994 commented Aug 14, 2022 • edited Loading

CoderPoet commented Aug 15, 2022

anuragagarwal561994 commented Aug 15, 2022

trask commented Aug 15, 2022

anuragagarwal561994 commented Aug 16, 2022 • edited Loading

trask commented Aug 16, 2022

anuragagarwal561994 commented Aug 16, 2022

anuragagarwal561994 commented Aug 18, 2022

michaelzhd commented Mar 8, 2023 • edited Loading

trask commented Aug 24, 2023

anuragagarwal561994 commented Aug 14, 2022 •

edited

Loading

anuragagarwal561994 commented Aug 16, 2022 •

edited

Loading

michaelzhd commented Mar 8, 2023 •

edited

Loading