You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit adds a minimum metric name validation which checks:
metric name starts with es. prefix
metric name is using . as a separator of elements
metric name is using characters from a white list
validate min number of elements = 3 elements ( prefix, group and the suffix name)
validate max number of elements and max characters per element
validate the suffix element in a metric name to be from the enumerated allow list
It also modifies existing metric names to adhere to those rules
* This way you can later add` es.indices.docs.count, es.indices.docs.ingested.total`, etc.)
20
+
* This way you can later add` es.indices.docs.total, es.indices.docs.ingested.total`, etc.)
21
21
22
22
Prefix metrics:
23
23
* Always use `es` as our root application name: this will give us a separate namespace and avoid any possibility of clashes with other metrics, and quick identification of Elasticsearch metrics on a dashboard.
24
24
* Follow the root prefix with a simple module name, team or area of code. E.g. `snapshot, repositories, indices, threadpool`. Notice the mix of singular and plural - here this is intentional, to reflect closely the existing names in the codebase (e.g. `reindex` and `indices`)
25
-
* In building a metric name, look for existing prefixes (e.g. module name and/or area of code, e.g. `blob_cache`) and for existing sub-elements as well (e.g. `error`) to build a good, consistent name. E.g. prefer the consistent use of `error.count` rather than introducing `failures`, `failed.count` or `errors`.``
26
-
* Avoid having sub-metrics under a name that is also a metric (e.g. do not create names like `es.repositories.elements`,` es.repositories.elements.utilization`; use` es.repositories.element.count` and` es.repositories.element.utilization `instead). Such metrics are hard to handle well in Elasticsearch, or in some internal structures (e.g. nested maps).
25
+
* In building a metric name, look for existing prefixes (e.g. module name and/or area of code, e.g. `blob_cache`) and for existing sub-elements as well (e.g. `error`) to build a good, consistent name. E.g. prefer the consistent use of `error.total` rather than introducing `failures`, `failed.total` or `errors`.``
26
+
* Avoid having sub-metrics under a name that is also a metric (e.g. do not create names like `es.repositories.elements`,` es.repositories.elements.utilization`; use` es.repositories.element.total` and` es.repositories.element.utilization `instead). Such metrics are hard to handle well in Elasticsearch, or in some internal structures (e.g. nested maps).
27
27
28
28
Keep the hierarchy compact: do not add elements if you don’t need to. There is a description field when registering a metric, prefer using that as an explanation. \
29
29
For example, if emitting existing metrics from node stats, do not use the whole “object path”, but choose the most significant terms.
@@ -35,7 +35,7 @@ The metric name can be generated but there should be no dynamic or variable cont
35
35
* Rule of thumb: you should be able to do aggregations (e.g. sum, avg) across a dimension of a given metric (without the need to aggregate over different metric names); on the other hand, any aggregation across any dimension of a given metric should be meaningful.
36
36
* There might be exceptions of course. For example:
37
37
* When similar metrics have significantly different implementations/related metrics. \
38
-
If we have only common metrics like `es.repositories.element.count, es.repositories.element.utilization, es.repositories.writes.total` for every blob storage implementation, then `s3,azure` should be an attribute. \
38
+
If we have only common metrics like `es.repositories.element.total, es.repositories.element.utilization, es.repositories.writes.total` for every blob storage implementation, then `s3,azure` should be an attribute. \
39
39
If we have specific metrics, e.g. for s3 storage classes, prefer using prefixed metric names for the specific metrics: <code>es.repositories.<strong>s3</strong>.deep_archive_access.total</code> (but keep `es.repositories.elements`)
40
40
* When you have a finite and fixed set of names it might be OK to have them in the name (e.g. "`young`" and "`old`" for GC generations).
41
41
@@ -47,12 +47,19 @@ Examples :
47
47
* <code>es.indices.storage.write.<strong>io</strong></code>, instead of <code>es.indices.storage.write.<strong>bytes_per_sec</strong></code>
48
48
* These can all be composed with the suffixes below, e.g. <code>es.process.jvm.collection.<strong>time.total</strong></code>, <code>es.indices.storage.write.<strong>total</strong></code> to represent the monotonic sum of time spent in GC and the total number of bytes written to indices respectively.
49
49
50
-
**Pluralization** and **suffixes**:
51
-
* If the metric is unit-less, use plural: `es.threadpool.activethreads`, `es.indices.docs`
52
-
* Use `total` as a suffix for monotonic sums (e.g. <code>es.indices.docs.deleted.<strong>total</strong></code>)
53
-
* Use `count` to represent the count of "things" in the metric name/namespace (e.g. if we have `es.process.jvm.classes.loaded`, we will express the number of classes currently loaded by the JVM as <code>es.process.jvm.classes.loaded.<strong>count</strong></code>, and the total number of classes loaded since the JVM started as <code>es.process.jvm.classes.loaded.<strong>total</strong></code>
50
+
**Suffixes**:
51
+
* Use `total` as a suffix for monotonic metrics (always increasing counter) (e.g. <code>es.indices.docs.deleted.<strong>total</strong></code>)
52
+
* Note: even though async counter is reporting a total cumulative value, it is till monotonic.
53
+
* Use `current` to represent the non-monotonic metrics (like gauges, upDownCounters)
54
+
* e.g. `current` vs `total` We can have <code>es.process.jvm.classes.loaded.<strong>current</strong></code> to express the number of classes currently loaded by the JVM, and the total number of classes loaded since the JVM started as <code>es.process.jvm.classes.loaded.<strong>total</strong></code>
54
55
* Use `ratio` to represent the ratio of two measures with identical unit (or unit-less) or measures that represent a fraction in the range [0, 1]. Examples:
55
56
* Exception: consider using utilization when the ratio is between a usage and its limit, e.g. the ratio between <code>es.process.jvm.heap.<strong>usage</strong></code> and <code>es.process.jvm.heap.<strong>limit</strong></code> should be <code>es.process.jvm.heap.<strong>utilization</strong></code>
57
+
* Use `status` to represent enum like gauges. example <code>es.health.overall.red.status</code> have values 1/0 to represent true/false
58
+
* Use `usage` to represent the amount used ouf of the known resource size
59
+
* Use `size` to represent the overall size of the resource measured
60
+
* Use `utilisation` to represent a fraction of usage out of the overall size of a resource measured
61
+
* Use `histogram` to represent instruments of type histogram
62
+
* Use `time` to represent passage of time
56
63
* If it has a unit of measure, then it should not be plural (and also not include the unit of measure, see above). Examples: <code>es.process.jvm.collection.time, es.process.mem.virtual.usage<strong>, </strong>es.indices.storage.utilization</code>
0 commit comments