Skip to content

Unified jmx metrics definition and their evolution #13238

Closed
@SylvainJuge

Description

@SylvainJuge

This is more a brain-dump/discussion-starter to gather feedback rather than a real issue, I'm sorry if it's getting a bit too long.

With the addition of the jmx-scraper in contrib, we can reuse the YAML-based JMX metrics capture from instrumentation, however the metric definitions themselves are still distinct and spread in multiple places:

  • JMX Gatherer relies on groovy metric definitions, is expected to be replaced with JMX Scraper
  • JMX Insight feature in instrumentation provides a YAML-configured JMX implementation with embedded YAML definitions for supported systems here.
  • JMX Scraper is a replacement of JMX Gatherer which relies on JMX Insight implementation with a currently distinct set of supported systems here.

With steps described in open-telemetry/opentelemetry-java-contrib#1362 that cover the supported systems, the goal is to provide almost equivalent YAML metric definitions to the ones that are currently provided with JMX Gatherer to provide a smooth migration path.

However, even when the migration from groovy defined metrics is complete, we still have two sets of distinct metrics in instrumentation/jmx-metrics and in JMX Scraper and we should aim to merge them to provide the following expected benefits:

  • minimize maintenance efforts and keep things aligned
  • capture identical metrics from within the JVM with instrumentation or from outside of the JVM with JMX Scraper
  • allow to build common consumers of those metrics, for example dashboards that do not have to depend on how the metric has been captured.

One of the downsides of having "one set of JMX metrics to rule them all" is that current users of JMX-scraper might not have any control over the version or stability of those metrics, for example:

  • let's assume that the reference YAML definitions are now part of instrumentation/jmx-metrics artifact
  • on every release of instrumentation, a new set of metrics will be included in JMX scraper with the dependency upgrade
  • as a result, changing the version of JMX scraper (for example when used through the otel jmxreceiver) would result in different metrics being captured, which could lead to unexpected behavior

For the "equivalent to groovy" legacy metric definitions that allow migration from JMX Gatherer, we can still embed them directly into JMX Scraper and then provide a config option to use them.

However, whenever the JMX definitions get enhanced/modified, the same compatibility issues can arise, and I wonder if and how we could iterate over the metrics definitions without breaking user expectations.

I think we could explore the following ideas to help providing some stability:

  • keep it simple: make jmx-scraper always use the latest version provided by instrumentation/jmx-metrics, thus giving the end-user control by selecting which version to use through the version of jmx-scraper.
  • keep it simple + legacy: same as previous but with an extra copy of the "legacy definitions"
  • if stability is needed on the user side, leverage the custom metrics YAML definition to manually download them from github and use local copies.
  • package the YAML metrics definitions to a separate repository/artifact, which could then be reused by instrumentation and jmx-scraper, switching from one version to another would be done through a simple config option
  • on every release of new JMX metrics rules, embed a copy directly into every consumer, which has the downside of always grow up.

I think that if JMX metrics were defined as part of semantic conventions, we would probably have a "use last version" approach, so I think the "keep it simple + legacy" option is probably the best compromise here. Using a local copy of previous or custom definitions is always possible, for example to use latest version of jmx-scraper with older definitions.

In addition to all of that, from the perspective of the consumers, I think we need to have a way to know which version of the metrics has been used. If the metrics are embedded in instrumentation/jmx-metrics, the "metrics version" then we should probably reuse that version in the sent data, and when using custom yaml files we should probably allow to set an explicit value per yaml file for later indentification. I am not very familiar with the OTLP protocol for metrics so maybe this is not something doable.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions