Skip to content

SpanMetricsProcessor for agent instrumentation #1018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anuragagarwal561994 opened this issue Aug 14, 2022 · 9 comments
Open

SpanMetricsProcessor for agent instrumentation #1018

anuragagarwal561994 opened this issue Aug 14, 2022 · 9 comments

Comments

@anuragagarwal561994
Copy link

anuragagarwal561994 commented Aug 14, 2022

Is your feature request related to a problem? Please describe.
Opentelmetry java instrumentation is nicely able to handle the application traces using annotations, a nice feature to have would be to record metrics of the spans using the annotations itself. The metrics would be of the way like span metrics processor does to record the R.E.D metrics.

Describe the solution you'd like
We can use the current span annotations to make a histogram.
Histogram itself will give the counter metrics.
We can define two fields in the annotation, one to disable metrics for a particular method and another to add a gauge for the corresponding method.
The whole feature can be disabled by default and can be enabled on demand by using a configuration option.
Attributes of a span can be used as labels and exceptions can be used as the corresponding error metrics (can be thought more on the same).
These metrics can be exposed using the prometheus metrics exporter.

Describe alternatives you've considered
We have currently been using the span metrics processor at the otel collector end for the same purpose where the traces are being pushed and the metrics are consolidated using the processor.

But these metrics are being impacted by the trace sampling set on client. If we make tracing 100% then it can impact performance. Also most of the times the consolidation can easily be handled at the client end easily and further consolidation can take place at prometheus, like most of the applications run today.

Even if we are using a dummy Span (in case of sampled trace), I believe we can record metrics (I may be wrong here), thus allowing us to not push traces 100% of the times yet getting accurate metrics data.

This would further enhance user experience and allow them to record application metrics without much hassel.

Additional Context
A lot of open source tools are adopting span metrics processor for showing R.E.D metrics along side their traces few examples include

Grafana
Signoz
Jaegar

@CoderPoet
Copy link

sounds great

@anuragagarwal561994
Copy link
Author

I can contribute for the same, would need some guidance and proposal approval.

@trask
Copy link
Member

trask commented Aug 15, 2022

hi all, check out related discussion at open-telemetry/opentelemetry-java#4260

@anuragagarwal561994
Copy link
Author

anuragagarwal561994 commented Aug 16, 2022

Hello again @trask, so I checked the discussion you referred, let me summarise the above discussion for others to follow, do let me know if I missed anything:

  1. Metrics Annotations opentelemetry-java-instrumentation#7030 this issue is same requirement but slightly different approach
  2. A PR was raised Add Metrics Annotations opentelemetry-java#4260 to add the extra annotations in the opentelemetry-java repo
  3. However the annotations at that time were being moved to another repo https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation-annotations
  4. The PR was closed because it did no longer belong in the opentelemetry-java repo Add Metrics Annotations opentelemetry-java#4260 (comment)

So if we can finalise what proposal works best for this feature, I suspect it to be a mix of both the approaches suggested by me and @fstab, this will give users more granular control. Both @fstab and I can contribute to this change.

@trask
Copy link
Member

trask commented Aug 16, 2022

@anuragagarwal561994 yes I agree with your summary 👍

@anuragagarwal561994
Copy link
Author

@trask what do you think should be the right approach here. First let's list down the requirements:

Behaviour Requirements:

Must Haves

  • We need to record metrics from the opentelemetry-java-instrumentation
  • User should be allowed to choose which methods to monitor
  • User to should allowed to choose to disable the functionality
  • These metrics should be exposed along with the open telemetry metrics exporter
  • Metrics should have span attributes as labels
  • Metrics control should be independent of tracing configuration / sampling at the agent's end
  • We should have metrics such as counter (call counts), gauge (concurrency), histogram (latency mainly)
  • Pod / Instance classification
  • Service classification
  • Metric name definition

Good to Have

  • User should be allowed to configure further granularities (this may only be applicable for histogram)
  • Histograms can further be configured to record on return values (but not sure how this can be simplified like request / response size)
  • User should be allowed to add a common attribute to these requests
  • Errors / Exceptions should be recorded as separate metrics

Let me know if I missed something or something else can be included here.

Approaches:

There are currently 2 approaches for recording metrics.

Single Annotation Approach
We can use @WithSpan annotation and use the same to record histogram. Counter can be exposed separately or can be used from _count metric of histogram. Gauge can define the concurrency of the method.

Functionality to disable this feature can be defined in otel.properties.

Feature will be disabled by default.

Enabling it will start recording their metrics everywhere.

Further control can be given to the user using another annotation like @WithMetrics kind of annotation where user can enable / disable the metrics and further define finer grain controlled. If this annotation is missing and the feature is enabled we can assume that the user wants to record everything with the defaults.

Multiple Annotation Approach
We can define explicitly different annotations like @Gauge, @Meter, @Counter, @WithMetrics. The last annotation is basically combination of the first three annotations with their defaults.

This is more or less as suggested by @fstab.

Again the feature control via otel.properties will remain same.

This approach will only record metrics where the user intends to.

We can further even try to find out a middle ground between the two approaches, making the life of users simpler.

Pod / Instance & Service classification can be handled by prometheus exporter I guess and should not be the responsibility of this feature.

Metric name definition can be done in the following ways:

  • We can either use span names by default and append _counter, _gauge or whatever the standards say
  • With annotations we can allow the user to explicitly define the metric names, although this would prefer the second approach as we have different annotations and it will be easier for the user to declare
  • We can use the method name, which I believe is anyways the name of the span so it might coincide with the first point

@trask let me know how this all sounds, what can we further think of and which is the preferred way the community prefers.

@anuragagarwal561994
Copy link
Author

@trask did you get a chance to go through the above?

@michaelzhd
Copy link

michaelzhd commented Mar 8, 2023

I am wondering why this isn't proposed earlier. Check what Datadog did with their Trace Metrics and how awesome it is. By the way, I agree that it would be nice if users are able to choose which methods to be monitored, but by default it should be all (enabled) or none(disabled). When enabled, users should be allowed to disable certain metrics (for example, health check metrics). Datadog Trace Metrics should provide a great example.

@trask
Copy link
Member

trask commented Aug 24, 2023

hey @anuragagarwal561994, I think this SpanMetricsProcessors would make a good contribution to the contrib repo, i'm going to transfer it over there. It can then be used as an agent extension (or outside of the Java agent) as needed.

@trask trask transferred this issue from open-telemetry/opentelemetry-java-instrumentation Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants