-
Notifications
You must be signed in to change notification settings - Fork 19
Description
As part of implementing fluent/fluent-bit#10651 I discovered that if you use a histogram metric with whole number bucket values over a certain size, they start to suffer from precision loss due to the digit limit when formatting the double
values the buckets are defined with.
e.g. with this as setup:
struct cmt_histogram_buckets *input_record_buckets = \
cmt_histogram_buckets_create_size((double[]){ 100, 1024, 2048, 4096,
100 * 1024, 1024 * 1024, 4 * 1024 * 1024,
10 * 1024 * 1024}, 8);
This is what comes out in the prometheus scrape
# HELP fluentbit_input_record_sizes Histogram of the size of input records
# TYPE fluentbit_input_record_sizes histogram
fluentbit_input_record_sizes_bucket{le="0.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="100.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="1024.0",name="tail.0"} 1
fluentbit_input_record_sizes_bucket{le="2048.0",name="tail.0"} 2
fluentbit_input_record_sizes_bucket{le="4096.0",name="tail.0"} 3
fluentbit_input_record_sizes_bucket{le="102400.0",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="1.04858e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="4.1943e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="+Inf",name="tail.0"} 0
fluentbit_input_record_sizes_sum{name="tail.0"} 48412
fluentbit_input_record_sizes_count{name="tail.0"} 5
As best I can tell this stems from this line (and presumably some default precision for the %g
printf
specifier)::
cmetrics/src/cmt_encode_prometheus.c
Line 311 in ab80dd0
len = snprintf(str, 64, "%g", val); |
Extra info
In the Prometheus text format docs/spec, as best as I can see, there's no specific stipulation for type or formatting of the le
labels: https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries . The only restrictions are the general ones placed on label values:
label_value
can be any sequence of UTF-8 characters, but the backslash (\
), double-quote ("
), and line feed (\n
) characters have to be escaped as\\
,\"
, and\n
, respectively
In general, from what I've personally seen so far at least, metric tools don't really give you any numerical or mathematical means of reasoning about these bucket values, given that the majority of metric querying etc. seems to work with string-based searching/filtering anyway - but I could be wrong on this.
As another C library reference, In the DigitalOcean prometheus C library, sprintf
with %g
is also used, so presumably it would suffer the same issue:
https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/prom/src/prom_metric_sample_histogram.c#L502-L509
In the OpenTelemetry project, the buckets are similarly defined as double
values:
https://github.com/open-telemetry/opentelemetry-proto/blob/8672494217bfc858e2a82a4e8c623d4a5530473a/opentelemetry/proto/metrics/v1/metrics.proto#L554-L568
There is/was an IntegerHistogram
type, but this was for integer observation values, and ironically it seems it has/had double
bucket boundaries anyway, and they decided to deprecate it (see open-telemetry/opentelemetry-proto#257, open-telemetry/opentelemetry-proto#270)
Info on %g
specifier: