-
Notifications
You must be signed in to change notification settings - Fork 151
SpanMetricsProcessor for agent instrumentation #1018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
sounds great |
I can contribute for the same, would need some guidance and proposal approval. |
hi all, check out related discussion at open-telemetry/opentelemetry-java#4260 |
Hello again @trask, so I checked the discussion you referred, let me summarise the above discussion for others to follow, do let me know if I missed anything:
So if we can finalise what proposal works best for this feature, I suspect it to be a mix of both the approaches suggested by me and @fstab, this will give users more granular control. Both @fstab and I can contribute to this change. |
@anuragagarwal561994 yes I agree with your summary 👍 |
@trask what do you think should be the right approach here. First let's list down the requirements: Behaviour Requirements: Must Haves
Good to Have
Let me know if I missed something or something else can be included here. Approaches: There are currently 2 approaches for recording metrics. Single Annotation Approach Functionality to disable this feature can be defined in Feature will be disabled by default. Enabling it will start recording their metrics everywhere. Further control can be given to the user using another annotation like Multiple Annotation Approach This is more or less as suggested by @fstab. Again the feature control via This approach will only record metrics where the user intends to. We can further even try to find out a middle ground between the two approaches, making the life of users simpler. Pod / Instance & Service classification can be handled by prometheus exporter I guess and should not be the responsibility of this feature. Metric name definition can be done in the following ways:
@trask let me know how this all sounds, what can we further think of and which is the preferred way the community prefers. |
@trask did you get a chance to go through the above? |
I am wondering why this isn't proposed earlier. Check what Datadog did with their Trace Metrics and how awesome it is. By the way, I agree that it would be nice if users are able to choose which methods to be monitored, but by default it should be all (enabled) or none(disabled). When enabled, users should be allowed to disable certain metrics (for example, health check metrics). Datadog Trace Metrics should provide a great example. |
hey @anuragagarwal561994, I think this SpanMetricsProcessors would make a good contribution to the contrib repo, i'm going to transfer it over there. It can then be used as an agent extension (or outside of the Java agent) as needed. |
Is your feature request related to a problem? Please describe.
Opentelmetry java instrumentation is nicely able to handle the application traces using annotations, a nice feature to have would be to record metrics of the spans using the annotations itself. The metrics would be of the way like span metrics processor does to record the R.E.D metrics.
Describe the solution you'd like
We can use the current span annotations to make a histogram.
Histogram itself will give the counter metrics.
We can define two fields in the annotation, one to disable metrics for a particular method and another to add a gauge for the corresponding method.
The whole feature can be disabled by default and can be enabled on demand by using a configuration option.
Attributes of a span can be used as labels and exceptions can be used as the corresponding error metrics (can be thought more on the same).
These metrics can be exposed using the prometheus metrics exporter.
Describe alternatives you've considered
We have currently been using the span metrics processor at the otel collector end for the same purpose where the traces are being pushed and the metrics are consolidated using the processor.
But these metrics are being impacted by the trace sampling set on client. If we make tracing 100% then it can impact performance. Also most of the times the consolidation can easily be handled at the client end easily and further consolidation can take place at prometheus, like most of the applications run today.
Even if we are using a dummy Span (in case of sampled trace), I believe we can record metrics (I may be wrong here), thus allowing us to not push traces 100% of the times yet getting accurate metrics data.
This would further enhance user experience and allow them to record application metrics without much hassel.
Additional Context
A lot of open source tools are adopting span metrics processor for showing R.E.D metrics along side their traces few examples include
Grafana
Signoz
Jaegar
The text was updated successfully, but these errors were encountered: