Skip to content

Support for integer histograms / increased bucket digit precision #239

@nuclearpidgeon

Description

@nuclearpidgeon

As part of implementing fluent/fluent-bit#10651 I discovered that if you use a histogram metric with whole number bucket values over a certain size, they start to suffer from precision loss due to the digit limit when formatting the double values the buckets are defined with.
e.g. with this as setup:

    struct cmt_histogram_buckets *input_record_buckets = \
        cmt_histogram_buckets_create_size((double[]){ 100, 1024, 2048, 4096,
                                                      100 * 1024, 1024 * 1024, 4 * 1024 * 1024,
                                                      10 * 1024 * 1024}, 8);

This is what comes out in the prometheus scrape

# HELP fluentbit_input_record_sizes Histogram of the size of input records
# TYPE fluentbit_input_record_sizes histogram
fluentbit_input_record_sizes_bucket{le="0.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="100.0",name="tail.0"} 0
fluentbit_input_record_sizes_bucket{le="1024.0",name="tail.0"} 1
fluentbit_input_record_sizes_bucket{le="2048.0",name="tail.0"} 2
fluentbit_input_record_sizes_bucket{le="4096.0",name="tail.0"} 3
fluentbit_input_record_sizes_bucket{le="102400.0",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="1.04858e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="4.1943e+06",name="tail.0"} 5
fluentbit_input_record_sizes_bucket{le="+Inf",name="tail.0"} 0
fluentbit_input_record_sizes_sum{name="tail.0"} 48412
fluentbit_input_record_sizes_count{name="tail.0"} 5

As best I can tell this stems from this line (and presumably some default precision for the %g printf specifier)::

len = snprintf(str, 64, "%g", val);


Extra info

In the Prometheus text format docs/spec, as best as I can see, there's no specific stipulation for type or formatting of the le labels: https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries . The only restrictions are the general ones placed on label values:

label_value can be any sequence of UTF-8 characters, but the backslash (\), double-quote ("), and line feed (\n) characters have to be escaped as \\, \", and \n, respectively
In general, from what I've personally seen so far at least, metric tools don't really give you any numerical or mathematical means of reasoning about these bucket values, given that the majority of metric querying etc. seems to work with string-based searching/filtering anyway - but I could be wrong on this.

As another C library reference, In the DigitalOcean prometheus C library, sprintf with %g is also used, so presumably it would suffer the same issue:
https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/prom/src/prom_metric_sample_histogram.c#L502-L509

In the OpenTelemetry project, the buckets are similarly defined as double values:
https://github.com/open-telemetry/opentelemetry-proto/blob/8672494217bfc858e2a82a4e8c623d4a5530473a/opentelemetry/proto/metrics/v1/metrics.proto#L554-L568
There is/was an IntegerHistogram type, but this was for integer observation values, and ironically it seems it has/had double bucket boundaries anyway, and they decided to deprecate it (see open-telemetry/opentelemetry-proto#257, open-telemetry/opentelemetry-proto#270)

Info on %g specifier:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions