-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified jmx metrics definition and their evolution #13238
Comments
After discussing this a bit in slack and live with @robsunday today, I think we should aim to do the following:
We plan to start with the following target systems:
|
Following yesterday Java SIG meeting, here is an update on the approach we will take to move this forward:
For the JVM metrics:
For other metrics
|
We discussed time unit conversion with @robsunday today and found that there is probably an extra challenge to align with semconv recommendations for JMX metrics regarding the metric type worth discussing during the next SIG meeting.
There are a few likely implications:
I think it pushes us to capture JMX metrics to "as they are exposed" rather than attempt to always fit the semantic conventions that could apply. This does not apply to metrics like EDIT: it seems that it is possible to have a "counter for doubles" as seen in the current example of |
Just checking that you've seen |
Yes, I wasn't aware of it when I wrote it but found it in the mean time, I'll update my comment to better reflect that. |
I think we can close this issue as the plan is now clear, the way to deal with changing metrics definitions will be done using the following strategy:
|
This is more a brain-dump/discussion-starter to gather feedback rather than a real issue, I'm sorry if it's getting a bit too long.
With the addition of the jmx-scraper in contrib, we can reuse the YAML-based JMX metrics capture from instrumentation, however the metric definitions themselves are still distinct and spread in multiple places:
With steps described in open-telemetry/opentelemetry-java-contrib#1362 that cover the supported systems, the goal is to provide almost equivalent YAML metric definitions to the ones that are currently provided with JMX Gatherer to provide a smooth migration path.
However, even when the migration from groovy defined metrics is complete, we still have two sets of distinct metrics in
instrumentation/jmx-metrics
and in JMX Scraper and we should aim to merge them to provide the following expected benefits:One of the downsides of having "one set of JMX metrics to rule them all" is that current users of JMX-scraper might not have any control over the version or stability of those metrics, for example:
instrumentation/jmx-metrics
artifactjmxreceiver
) would result in different metrics being captured, which could lead to unexpected behaviorFor the "equivalent to groovy" legacy metric definitions that allow migration from JMX Gatherer, we can still embed them directly into JMX Scraper and then provide a config option to use them.
However, whenever the JMX definitions get enhanced/modified, the same compatibility issues can arise, and I wonder if and how we could iterate over the metrics definitions without breaking user expectations.
I think we could explore the following ideas to help providing some stability:
I think that if JMX metrics were defined as part of semantic conventions, we would probably have a "use last version" approach, so I think the "keep it simple + legacy" option is probably the best compromise here. Using a local copy of previous or custom definitions is always possible, for example to use latest version of jmx-scraper with older definitions.
In addition to all of that, from the perspective of the consumers, I think we need to have a way to know which version of the metrics has been used. If the metrics are embedded in
instrumentation/jmx-metrics
, the "metrics version" then we should probably reuse that version in the sent data, and when using custom yaml files we should probably allow to set an explicit value per yaml file for later indentification. I am not very familiar with the OTLP protocol for metrics so maybe this is not something doable.The text was updated successfully, but these errors were encountered: