Skip to content

Commit 07780a8

Browse files
authored
Add metric name validation (#103388)
This commit adds a minimum metric name validation which checks: metric name starts with es. prefix metric name is using . as a separator of elements metric name is using characters from a white list validate min number of elements = 3 elements ( prefix, group and the suffix name) validate max number of elements and max characters per element validate the suffix element in a metric name to be from the enumerated allow list It also modifies existing metric names to adhere to those rules
1 parent 943b2ea commit 07780a8

File tree

18 files changed

+700
-141
lines changed

18 files changed

+700
-141
lines changed

modules/apm/NAMING.md

+15-8
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@ The **hierarchy** should be built by putting "more common" elements at the begin
1717

1818
Example:
1919
* prefer `es.indices.docs.deleted.total `to `es.indices.total.deleted.docs`
20-
* This way you can later add` es.indices.docs.count, es.indices.docs.ingested.total`, etc.)
20+
* This way you can later add` es.indices.docs.total, es.indices.docs.ingested.total`, etc.)
2121

2222
Prefix metrics:
2323
* Always use `es` as our root application name: this will give us a separate namespace and avoid any possibility of clashes with other metrics, and quick identification of Elasticsearch metrics on a dashboard.
2424
* Follow the root prefix with a simple module name, team or area of code. E.g. `snapshot, repositories, indices, threadpool`. Notice the mix of singular and plural - here this is intentional, to reflect closely the existing names in the codebase (e.g. `reindex` and `indices`)
25-
* In building a metric name, look for existing prefixes (e.g. module name and/or area of code, e.g. `blob_cache`) and for existing sub-elements as well (e.g. `error`) to build a good, consistent name. E.g. prefer the consistent use of `error.count` rather than introducing `failures`, `failed.count` or `errors`.` `
26-
* Avoid having sub-metrics under a name that is also a metric (e.g. do not create names like `es.repositories.elements`,` es.repositories.elements.utilization`; use` es.repositories.element.count` and` es.repositories.element.utilization `instead). Such metrics are hard to handle well in Elasticsearch, or in some internal structures (e.g. nested maps).
25+
* In building a metric name, look for existing prefixes (e.g. module name and/or area of code, e.g. `blob_cache`) and for existing sub-elements as well (e.g. `error`) to build a good, consistent name. E.g. prefer the consistent use of `error.total` rather than introducing `failures`, `failed.total` or `errors`.` `
26+
* Avoid having sub-metrics under a name that is also a metric (e.g. do not create names like `es.repositories.elements`,` es.repositories.elements.utilization`; use` es.repositories.element.total` and` es.repositories.element.utilization `instead). Such metrics are hard to handle well in Elasticsearch, or in some internal structures (e.g. nested maps).
2727

2828
Keep the hierarchy compact: do not add elements if you don’t need to. There is a description field when registering a metric, prefer using that as an explanation. \
2929
For example, if emitting existing metrics from node stats, do not use the whole “object path”, but choose the most significant terms.
@@ -35,7 +35,7 @@ The metric name can be generated but there should be no dynamic or variable cont
3535
* Rule of thumb: you should be able to do aggregations (e.g. sum, avg) across a dimension of a given metric (without the need to aggregate over different metric names); on the other hand, any aggregation across any dimension of a given metric should be meaningful.
3636
* There might be exceptions of course. For example:
3737
* When similar metrics have significantly different implementations/related metrics. \
38-
If we have only common metrics like `es.repositories.element.count, es.repositories.element.utilization, es.repositories.writes.total` for every blob storage implementation, then `s3,azure` should be an attribute. \
38+
If we have only common metrics like `es.repositories.element.total, es.repositories.element.utilization, es.repositories.writes.total` for every blob storage implementation, then `s3,azure` should be an attribute. \
3939
If we have specific metrics, e.g. for s3 storage classes, prefer using prefixed metric names for the specific metrics: <code>es.repositories.<strong>s3</strong>.deep_archive_access.total</code> (but keep `es.repositories.elements`)
4040
* When you have a finite and fixed set of names it might be OK to have them in the name (e.g. "`young`" and "`old`" for GC generations).
4141

@@ -47,12 +47,19 @@ Examples :
4747
* <code>es.indices.storage.write.<strong>io</strong></code>, instead of <code>es.indices.storage.write.<strong>bytes_per_sec</strong></code>
4848
* These can all be composed with the suffixes below, e.g. <code>es.process.jvm.collection.<strong>time.total</strong></code>, <code>es.indices.storage.write.<strong>total</strong></code> to represent the monotonic sum of time spent in GC and the total number of bytes written to indices respectively.
4949

50-
**Pluralization** and **suffixes**:
51-
* If the metric is unit-less, use plural: `es.threadpool.activethreads`, `es.indices.docs`
52-
* Use `total` as a suffix for monotonic sums (e.g. <code>es.indices.docs.deleted.<strong>total</strong></code>)
53-
* Use `count` to represent the count of "things" in the metric name/namespace (e.g. if we have `es.process.jvm.classes.loaded`, we will express the number of classes currently loaded by the JVM as <code>es.process.jvm.classes.loaded.<strong>count</strong></code>, and the total number of classes loaded since the JVM started as <code>es.process.jvm.classes.loaded.<strong>total</strong></code>
50+
**Suffixes**:
51+
* Use `total` as a suffix for monotonic metrics (always increasing counter) (e.g. <code>es.indices.docs.deleted.<strong>total</strong></code>)
52+
* Note: even though async counter is reporting a total cumulative value, it is till monotonic.
53+
* Use `current` to represent the non-monotonic metrics (like gauges, upDownCounters)
54+
* e.g. `current` vs `total` We can have <code>es.process.jvm.classes.loaded.<strong>current</strong></code> to express the number of classes currently loaded by the JVM, and the total number of classes loaded since the JVM started as <code>es.process.jvm.classes.loaded.<strong>total</strong></code>
5455
* Use `ratio` to represent the ratio of two measures with identical unit (or unit-less) or measures that represent a fraction in the range [0, 1]. Examples:
5556
* Exception: consider using utilization when the ratio is between a usage and its limit, e.g. the ratio between <code>es.process.jvm.heap.<strong>usage</strong></code> and <code>es.process.jvm.heap.<strong>limit</strong></code> should be <code>es.process.jvm.heap.<strong>utilization</strong></code>
57+
* Use `status` to represent enum like gauges. example <code>es.health.overall.red.status</code> have values 1/0 to represent true/false
58+
* Use `usage` to represent the amount used ouf of the known resource size
59+
* Use `size` to represent the overall size of the resource measured
60+
* Use `utilisation` to represent a fraction of usage out of the overall size of a resource measured
61+
* Use `histogram` to represent instruments of type histogram
62+
* Use `time` to represent passage of time
5663
* If it has a unit of measure, then it should not be plural (and also not include the unit of measure, see above). Examples: <code>es.process.jvm.collection.time, es.process.mem.virtual.usage<strong>, </strong>es.indices.storage.utilization</code>
5764

5865
### Attributes

modules/apm/src/main/java/org/elasticsearch/telemetry/apm/AbstractInstrument.java

+3-7
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import io.opentelemetry.api.metrics.Meter;
1212

1313
import org.elasticsearch.core.Nullable;
14+
import org.elasticsearch.telemetry.apm.internal.MetricNameValidator;
1415
import org.elasticsearch.telemetry.metric.Instrument;
1516

1617
import java.security.AccessController;
@@ -23,6 +24,7 @@
2324
* An instrument that contains the name, description and unit. The delegate may be replaced when
2425
* the provider is updated.
2526
* Subclasses should implement the builder, which is used on initialization and provider updates.
27+
*
2628
* @param <T> delegated instrument
2729
*/
2830
public abstract class AbstractInstrument<T> implements Instrument {
@@ -50,19 +52,13 @@ void setProvider(@Nullable Meter meter) {
5052
}
5153

5254
protected abstract static class Builder<T> {
53-
private static final int MAX_NAME_LENGTH = 255;
5455

5556
protected final String name;
5657
protected final String description;
5758
protected final String unit;
5859

5960
public Builder(String name, String description, String unit) {
60-
if (name.length() > MAX_NAME_LENGTH) {
61-
throw new IllegalArgumentException(
62-
"Instrument name [" + name + "] with length [" + name.length() + "] exceeds maximum length [" + MAX_NAME_LENGTH + "]"
63-
);
64-
}
65-
this.name = Objects.requireNonNull(name);
61+
this.name = MetricNameValidator.validate(name);
6662
this.description = Objects.requireNonNull(description);
6763
this.unit = Objects.requireNonNull(unit);
6864
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0 and the Server Side Public License, v 1; you may not use this file except
5+
* in compliance with, at your election, the Elastic License 2.0 or the Server
6+
* Side Public License, v 1.
7+
*/
8+
9+
package org.elasticsearch.telemetry.apm.internal;
10+
11+
import java.util.Objects;
12+
import java.util.Set;
13+
import java.util.regex.Matcher;
14+
import java.util.regex.Pattern;
15+
import java.util.stream.Collectors;
16+
17+
public class MetricNameValidator {
18+
private static final Pattern ALLOWED_CHARACTERS = Pattern.compile("[a-z][a-z0-9_]*");
19+
static final Set<String> ALLOWED_SUFFIXES = Set.of(
20+
"total",
21+
"current",
22+
"ratio",
23+
"status" /*a workaround for enums */,
24+
"usage",
25+
"size",
26+
"utilization",
27+
"histogram",
28+
"time"
29+
);
30+
static final int MAX_METRIC_NAME_LENGTH = 255;
31+
32+
static final int MAX_ELEMENT_LENGTH = 30;
33+
static final int MAX_NUMBER_OF_ELEMENTS = 10;
34+
35+
private MetricNameValidator() {}
36+
37+
/**
38+
* Validates a metric name as per guidelines in Naming.md
39+
*
40+
* @param metricName metric name to be validated
41+
* @throws IllegalArgumentException an exception indicating an incorrect metric name
42+
*/
43+
public static String validate(String metricName) {
44+
Objects.requireNonNull(metricName);
45+
validateMaxMetricNameLength(metricName);
46+
47+
String[] elements = metricName.split("\\.");
48+
hasESPrefix(elements, metricName);
49+
hasAtLeast3Elements(elements, metricName);
50+
hasNotBreachNumberOfElementsLimit(elements, metricName);
51+
lastElementIsFromAllowList(elements, metricName);
52+
perElementValidations(elements, metricName);
53+
return metricName;
54+
}
55+
56+
private static void validateMaxMetricNameLength(String metricName) {
57+
if (metricName.length() > MAX_METRIC_NAME_LENGTH) {
58+
throw new IllegalArgumentException(
59+
"Metric name length "
60+
+ metricName.length()
61+
+ "is longer than max metric name length:"
62+
+ MAX_METRIC_NAME_LENGTH
63+
+ " Name was: "
64+
+ metricName
65+
);
66+
}
67+
}
68+
69+
private static void lastElementIsFromAllowList(String[] elements, String name) {
70+
String lastElement = elements[elements.length - 1];
71+
if (ALLOWED_SUFFIXES.contains(lastElement) == false) {
72+
throw new IllegalArgumentException(
73+
"Metric name should end with one of ["
74+
+ ALLOWED_SUFFIXES.stream().collect(Collectors.joining(","))
75+
+ "] "
76+
+ "Last element was: "
77+
+ lastElement
78+
+ ". "
79+
+ "Name was: "
80+
+ name
81+
);
82+
}
83+
}
84+
85+
private static void hasNotBreachNumberOfElementsLimit(String[] elements, String name) {
86+
if (elements.length > MAX_NUMBER_OF_ELEMENTS) {
87+
throw new IllegalArgumentException(
88+
"Metric name should have at most 10 elements. It had: " + elements.length + ". The name was: " + name
89+
);
90+
}
91+
}
92+
93+
private static void hasAtLeast3Elements(String[] elements, String name) {
94+
if (elements.length < 3) {
95+
throw new IllegalArgumentException(
96+
"Metric name consist of at least 3 elements. An es. prefix, group and a name. The name was: " + name
97+
);
98+
}
99+
}
100+
101+
private static void hasESPrefix(String[] elements, String name) {
102+
if (elements[0].equals("es") == false) {
103+
throw new IllegalArgumentException(
104+
"Metric name should start with \"es.\" prefix and use \".\" as a separator. Name was: " + name
105+
);
106+
}
107+
}
108+
109+
private static void perElementValidations(String[] elements, String name) {
110+
for (String element : elements) {
111+
hasOnlyAllowedCharacters(element, name);
112+
hasNotBreachLengthLimit(element, name);
113+
}
114+
}
115+
116+
private static void hasNotBreachLengthLimit(String element, String name) {
117+
if (element.length() > MAX_ELEMENT_LENGTH) {
118+
throw new IllegalArgumentException(
119+
"Metric name's element should not be longer than "
120+
+ MAX_ELEMENT_LENGTH
121+
+ " characters. Was: "
122+
+ element.length()
123+
+ ". Name was: "
124+
+ name
125+
);
126+
}
127+
}
128+
129+
private static void hasOnlyAllowedCharacters(String element, String name) {
130+
Matcher matcher = ALLOWED_CHARACTERS.matcher(element);
131+
if (matcher.matches() == false) {
132+
throw new IllegalArgumentException(
133+
"Metric name should only use [a-z][a-z0-9_]* characters. "
134+
+ "Element does not match: \""
135+
+ element
136+
+ "\". "
137+
+ "Name was: "
138+
+ name
139+
);
140+
}
141+
}
142+
}

modules/apm/src/test/java/org/elasticsearch/telemetry/apm/APMMeterRegistryTests.java

+12-27
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,8 @@
3535
import java.util.List;
3636
import java.util.function.Supplier;
3737

38-
import static org.hamcrest.Matchers.containsString;
3938
import static org.hamcrest.Matchers.equalTo;
4039
import static org.hamcrest.Matchers.hasSize;
41-
import static org.hamcrest.Matchers.instanceOf;
4240
import static org.hamcrest.Matchers.sameInstance;
4341

4442
public class APMMeterRegistryTests extends ESTestCase {
@@ -84,8 +82,8 @@ public void testMeterIsOverridden() {
8482
public void testLookupByName() {
8583
var apmMeter = new APMMeterService(TELEMETRY_ENABLED, () -> testOtel, () -> noopOtel).getMeterRegistry();
8684

87-
DoubleCounter registeredCounter = apmMeter.registerDoubleCounter("name", "desc", "unit");
88-
DoubleCounter lookedUpCounter = apmMeter.getDoubleCounter("name");
85+
DoubleCounter registeredCounter = apmMeter.registerDoubleCounter("es.test.name.total", "desc", "unit");
86+
DoubleCounter lookedUpCounter = apmMeter.getDoubleCounter("es.test.name.total");
8987

9088
assertThat(lookedUpCounter, sameInstance(registeredCounter));
9189
}
@@ -103,19 +101,6 @@ public void testNoopIsSetOnStop() {
103101
assertThat(meter, sameInstance(noopOtel));
104102
}
105103

106-
public void testMaxNameLength() {
107-
APMMeterService apmMeter = new APMMeterService(TELEMETRY_ENABLED, () -> testOtel, () -> noopOtel);
108-
apmMeter.start();
109-
int max_length = 255;
110-
var counter = apmMeter.getMeterRegistry().registerLongCounter("a".repeat(max_length), "desc", "count");
111-
assertThat(counter, instanceOf(LongCounter.class));
112-
IllegalArgumentException iae = expectThrows(
113-
IllegalArgumentException.class,
114-
() -> apmMeter.getMeterRegistry().registerLongCounter("a".repeat(max_length + 1), "desc", "count")
115-
);
116-
assertThat(iae.getMessage(), containsString("exceeds maximum length [255]"));
117-
}
118-
119104
public void testAllInstrumentsSwitchProviders() {
120105
TestAPMMeterService apmMeter = new TestAPMMeterService(
121106
Settings.builder().put(APMAgentSettings.TELEMETRY_METRICS_ENABLED_SETTING.getKey(), false).build(),
@@ -125,18 +110,18 @@ public void testAllInstrumentsSwitchProviders() {
125110
APMMeterRegistry registry = apmMeter.getMeterRegistry();
126111

127112
Supplier<DoubleWithAttributes> doubleObserver = () -> new DoubleWithAttributes(1.5, Collections.emptyMap());
128-
DoubleCounter dc = registry.registerDoubleCounter("dc", "", "");
129-
DoubleUpDownCounter dudc = registry.registerDoubleUpDownCounter("dudc", "", "");
130-
DoubleHistogram dh = registry.registerDoubleHistogram("dh", "", "");
131-
DoubleAsyncCounter dac = registry.registerDoubleAsyncCounter("dac", "", "", doubleObserver);
132-
DoubleGauge dg = registry.registerDoubleGauge("dg", "", "", doubleObserver);
113+
DoubleCounter dc = registry.registerDoubleCounter("es.test.dc.total", "", "");
114+
DoubleUpDownCounter dudc = registry.registerDoubleUpDownCounter("es.test.dudc.current", "", "");
115+
DoubleHistogram dh = registry.registerDoubleHistogram("es.test.dh.histogram", "", "");
116+
DoubleAsyncCounter dac = registry.registerDoubleAsyncCounter("es.test.dac.total", "", "", doubleObserver);
117+
DoubleGauge dg = registry.registerDoubleGauge("es.test.dg.current", "", "", doubleObserver);
133118

134119
Supplier<LongWithAttributes> longObserver = () -> new LongWithAttributes(100, Collections.emptyMap());
135-
LongCounter lc = registry.registerLongCounter("lc", "", "");
136-
LongUpDownCounter ludc = registry.registerLongUpDownCounter("ludc", "", "");
137-
LongHistogram lh = registry.registerLongHistogram("lh", "", "");
138-
LongAsyncCounter lac = registry.registerLongAsyncCounter("lac", "", "", longObserver);
139-
LongGauge lg = registry.registerLongGauge("lg", "", "", longObserver);
120+
LongCounter lc = registry.registerLongCounter("es.test.lc.total", "", "");
121+
LongUpDownCounter ludc = registry.registerLongUpDownCounter("es.test.ludc.total", "", "");
122+
LongHistogram lh = registry.registerLongHistogram("es.test.lh.histogram", "", "");
123+
LongAsyncCounter lac = registry.registerLongAsyncCounter("es.test.lac.total", "", "", longObserver);
124+
LongGauge lg = registry.registerLongGauge("es.test.lg.current", "", "", longObserver);
140125

141126
apmMeter.setEnabled(true);
142127

modules/apm/src/test/java/org/elasticsearch/telemetry/apm/MeterRegistryConcurrencyTests.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
import static org.hamcrest.Matchers.sameInstance;
2929

3030
public class MeterRegistryConcurrencyTests extends ESTestCase {
31-
private final String name = "name";
31+
private final String name = "es.test.name.total";
3232
private final String description = "desc";
3333
private final String unit = "kg";
3434
private final Meter noopMeter = OpenTelemetry.noop().getMeter("noop");

0 commit comments

Comments
 (0)