Skip to content

Commit 96ea453

Browse files
ywwgbwplotka
andauthored
Add guide for UTF8 usage (#2546)
Signed-off-by: Owen Williams <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]>
1 parent 2c868b1 commit 96ea453

File tree

1 file changed

+122
-0
lines changed

1 file changed

+122
-0
lines changed

content/docs/guides/utf8.md

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: UTF-8 in Prometheus
3+
---
4+
5+
# Introduction
6+
7+
Versions of Prometheus before 3.0 required that metric and label names adhere to
8+
a strict set of character requirements. With Prometheus 3.0, all UTF-8 strings
9+
are valid names, but there are some manual changes needed for other parts of the ecosystem to introduce names with any UTF-8 characters.
10+
11+
There may also be circumstances where users want to enforce the legacy character
12+
set, perhaps for compatibility with an older system or one that does not yet
13+
support UTF-8.
14+
15+
This document guides you through the UTF-8 transition details.
16+
17+
# Go Instrumentation
18+
19+
Currently, metrics created by the official Prometheus [client_golang library](github.com/prometheus/client_golang) will reject UTF-8 names
20+
by default. It is necessary to change the default validation scheme to allow
21+
UTF-8. The requirement to set this value will be removed in a future version of
22+
the common library.
23+
24+
```golang
25+
import "github.com/prometheus/common/model"
26+
27+
func init() {
28+
model.NameValidationScheme = model.UTF8Validation
29+
}
30+
```
31+
32+
If users want to enforce the legacy character set, they can set the validation
33+
scheme to `LegacyValidation`.
34+
35+
Setting the validation scheme must be done before the instantiation of metrics
36+
and can be set on the fly if desired.
37+
38+
## Instrumenting in other languages
39+
40+
Other client libraries may have similar requirements to set the validation
41+
scheme. Check the documentation for the library you are using.
42+
43+
# Configuring Name Validation during Scraping
44+
45+
By default, Prometheus 3.0 accepts all UTF-8 strings as valid metric and label
46+
names. It is possible to override this behavior for scraped targets and reject
47+
names that do not conform to the legacy character set.
48+
49+
This option can be set in the Prometheus YAML file on a global basis:
50+
51+
```yaml
52+
global:
53+
metric_name_validation_scheme: legacy
54+
```
55+
56+
or on a per-scrape config basis:
57+
58+
```yaml
59+
scrape_configs:
60+
- job_name: prometheus
61+
metric_name_validation_scheme: legacy
62+
```
63+
64+
Scrape config settings override the global setting.
65+
66+
## Scrape Content Negotiation for UTF-8 escaping
67+
68+
At scrape time, the scraping system **must** pass `escaping=allow-utf-8` in the
69+
Accept header in order to be served UTF-8 names. If a system being scraped does
70+
not see this header, it will automatically convert UTF-8 names to
71+
legacy-compatible using underscore replacement.
72+
73+
Scraping systems can also request a specfic escaping method if desired by
74+
setting the `escaping` header to a different value.
75+
76+
* `underscores`: The default: convert legacy-invalid characters to underscores.
77+
* `dots`: similar to UnderscoreEscaping, except that dots are converted to
78+
`_dot_` and pre-existing underscores are converted to `__`. This allows for
79+
round-tripping of simple metric names that also contain dots.
80+
* `values`: This mode prepends the name with `U__` and replaces all invalid
81+
characters with the unicode value, surrounded by underscores. Single
82+
underscores are replaced with double underscores. This mode allows for full
83+
round-tripping of UTF-8 names with a legacy system.
84+
85+
## Remote Write 2.0
86+
87+
Remote Write 2.0 automatically accepts all UTF-8 names in Prometheus 3.0. There
88+
is no way to enforce the legacy character set validation with Remote Write 2.0.
89+
90+
# OTLP Metrics
91+
92+
OTLP receiver in Prometheus 3.0 still normalizes all names to Prometheus format by default. You can change this in `otlp` section of the Prometheus configuration as follows:
93+
94+
95+
otlp:
96+
# Ingest OTLP data keeping UTF-8 characters in metric/label names.
97+
translation_strategy: NoUTF8EscapingWithSuffixes
98+
99+
100+
See [OpenTelemetry guide](./opentelemetry) for more details.
101+
102+
103+
# Querying
104+
105+
106+
Querying for metrics with UTF-8 names will require a slightly different syntax
107+
in PromQL.
108+
109+
The classic query syntax will still work for legacy-compatible names:
110+
111+
`my_metric{}`
112+
113+
But UTF-8 names must be quoted **and** moved into the braces:
114+
115+
`{"my.metric"}`
116+
117+
Label names must also be quoted if they contain legacy-incompatible characters:
118+
119+
`{"metric.name", "my.label.name"="bar"}`
120+
121+
The metric name can appear anywhere inside the braces, but style prefers that it
122+
be the first term.

0 commit comments

Comments
 (0)