Skip to content

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Oct 30, 2025

Fixes #11082


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
---
service:
  flush: 1
  daemon: Off
  log_level: debug
  # Enable/Disable the built-in HTTP Server for metrics
  http_server: Off
  http_listen: 127.0.0.1
  http_port: 2020

pipeline:
  inputs:
    - name: fluentbit_metrics
      tag: metrics_fluentbit
      scrape_interval: 60s

  outputs:
    - name: prometheus_remote_write
      match: 'metrics_*'
      host: localhost
      port: 9090
      uri: /api/v1/write
      retry_limit: 2
      log_response_payload: True
      tls: Off
      add_label: job fluentbit2

With the following configuration and Prometheus:

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: []

Launch Prometheus with:

$ prometheus --config.file=/path/to/prometheus.yaml --web.enable-remote-write-receiver  
  • Debug log output from testing the change
Fluent Bit v4.2.0
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___   __  
|  ___| |                | |   | ___ (_) |           /   | /  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| | `| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| |  | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |__| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/10/30 18:27:48.943855000] [ info] Configuration:
[2025/10/30 18:27:48.943862000] [ info]  flush time     | 1.000000 seconds
[2025/10/30 18:27:48.943866000] [ info]  grace          | 5 seconds
[2025/10/30 18:27:48.943868000] [ info]  daemon         | 0
[2025/10/30 18:27:48.943870000] [ info] ___________
[2025/10/30 18:27:48.943872000] [ info]  inputs:
[2025/10/30 18:27:48.943874000] [ info]      fluentbit_metrics
[2025/10/30 18:27:48.943876000] [ info] ___________
[2025/10/30 18:27:48.943878000] [ info]  filters:
[2025/10/30 18:27:48.943880000] [ info] ___________
[2025/10/30 18:27:48.943882000] [ info]  outputs:
[2025/10/30 18:27:48.943884000] [ info]      prometheus_remote_write.0
[2025/10/30 18:27:48.943886000] [ info] ___________
[2025/10/30 18:27:48.943888000] [ info]  collectors:
[2025/10/30 18:27:48.945038000] [ info] [fluent bit] version=4.2.0, commit=03dada8a7b, pid=86872
[2025/10/30 18:27:48.945046000] [debug] [engine] coroutine stack size: 36864 bytes (36.0K)
[2025/10/30 18:27:48.945508000] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/10/30 18:27:48.945679000] [ info] [simd    ] NEON
[2025/10/30 18:27:48.945683000] [ info] [cmetrics] version=1.0.5
[2025/10/30 18:27:48.946101000] [ info] [ctraces ] version=0.6.6
[2025/10/30 18:27:48.946227000] [ info] [input:fluentbit_metrics:fluentbit_metrics.0] initializing
[2025/10/30 18:27:48.946233000] [ info] [input:fluentbit_metrics:fluentbit_metrics.0] storage_strategy='memory' (memory only)
[2025/10/30 18:27:48.946240000] [debug] [fluentbit_metrics:fluentbit_metrics.0] created event channels: read=25 write=26
[2025/10/30 18:27:48.946411000] [debug] [prometheus_remote_write:prometheus_remote_write.0] created event channels: read=27 write=28
[2025/10/30 18:27:48.946687000] [ info] [output:prometheus_remote_write:prometheus_remote_write.0] worker #0 started
[2025/10/30 18:27:48.946695000] [ info] [output:prometheus_remote_write:prometheus_remote_write.0] worker #1 started
[2025/10/30 18:27:48.946936000] [ info] [sp] stream processor started
[2025/10/30 18:27:48.947026000] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
[2025/10/30 18:28:50.4113000] [debug] [task] created task=0x8c70106c0 id=0 OK
[2025/10/30 18:28:50.4326000] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] task_id=0 assigned to thread #0
[2025/10/30 18:28:50.4480000] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetrics msgpack size: 7550
[2025/10/30 18:28:50.7549000] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] cmetric_id=0 decoded 0-7550 payload_size=4065
[2025/10/30 18:28:50.7608000] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] final payload size: 4065
[2025/10/30 18:28:50.21820000] [debug] [upstream] KA connection #59 to localhost:9090 is connected
[2025/10/30 18:28:50.22244000] [debug] [http_client] not using http_proxy for header
[2025/10/30 18:28:50.24022000] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] localhost:9090, HTTP status=204
[2025/10/30 18:28:50.24054000] [debug] [upstream] KA connection #59 to localhost:9090 is now available
[2025/10/30 18:28:50.24067000] [debug] [output:prometheus_remote_write:prometheus_remote_write.0] http_post result FLB_OK
[2025/10/30 18:28:50.24090000] [debug] [out flush] cb_destroy coro_id=0
[2025/10/30 18:28:50.24144000] [debug] [task] destroy task=0x8c70106c0 (task_id=0)
<snip>
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features
    • Introduced heartbeat-based logging metrics that periodically emit per-type log counters for improved observability.
    • Added metrics lifecycle APIs to create and destroy logging metrics, and initialization of heartbeat timing so metrics are emitted at configured intervals.

@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

Walkthrough

Added heartbeat-driven metrics emission with cross-platform monotonic timing, new metrics lifecycle APIs, and heartbeat scheduling fields on the public log struct; heartbeat checks are invoked from the log print path to emit per-type counters at configured intervals.

Changes

Cohort / File(s) Summary
Header field additions
include/fluent-bit/flb_log.h
Added two public fields to struct flb_log: uint64_t next_hb_ns and uint64_t hb_interval_ns for heartbeat scheduling.
Heartbeat & monotonic time
src/flb_log.c
Added cross-platform monotonic_now_ns() (Windows and POSIX variants) and heartbeat scheduling logic initialized in flb_log_create().
Metrics lifecycle & emission
src/flb_log.c
Added flb_log_metrics_create() and flb_log_metrics_destroy() APIs and flb_log_metrics_heartbeat() to emit per-type log counters; integrated heartbeat invocation into flb_log_print() before counter increments.

Sequence Diagram

sequenceDiagram
    participant Print as flb_log_print()
    participant Time as monotonic_now_ns()
    participant HB as Heartbeat Check
    participant Metrics as flb_log_metrics_heartbeat()

    Print->>Time: get current monotonic ns
    Print->>HB: check (log->next_hb_ns, current_ns)
    alt current_ns >= log->next_hb_ns
        HB->>Metrics: emit per-type counters
        Metrics-->>HB: update metrics state
        HB->>Print: advance log->next_hb_ns
    else not yet
        HB-->>Print: no-op
    end
    Print->>Print: increment per-type counter
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review focus:
    • monotonic_now_ns() platform-specific implementations for correctness and edge cases.
    • Initialization of hb_interval_ns / next_hb_ns in flb_log_create() and their thread-safety.
    • Integration point in flb_log_print() (hot path) for performance impact and correctness.
    • Public APIs flb_log_metrics_create() / flb_log_metrics_destroy() for allocation, cleanup, and null-safety.

Poem

🐇 A rabbit taps the monotonic drum,
Beats in nanoseconds, steady they come.
Counters hum, heartbeats sing,
Metrics dance on every ping.
Hooray — the logs keep time and run!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Linked Issues Check ❓ Inconclusive Issue #11082 requires that fluentbit_ metrics continue being included in prometheus_remote_write payloads indefinitely, without ceasing after approximately one hour. The PR introduces heartbeat mechanisms (flb_log_metrics_create(), flb_log_metrics_destroy(), flb_log_metrics_heartbeat()) that emit per-type log counters at configured intervals, and integrates heartbeat calls into flb_log_print() prior to counter increment. However, the provided summaries lack sufficient detail about how these changes specifically prevent the metric exclusion behavior or how they integrate with the fluentbit_metrics input and prometheus_remote_write output pipeline to definitively confirm the issue is fully resolved. To conclusively validate compliance, the analysis would benefit from additional context showing: how the heartbeat mechanism specifically prevents metrics from being excluded after one hour, how the new metrics APIs integrate with the fluentbit_metrics input, and confirmation that only fluentbit_ metrics are affected as intended (not other metric sources).
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "log: Update log related metrics with a certian interval" directly relates to the main changes in this PR, which introduce heartbeat-based metric updating infrastructure. The changes add interval-based metric updating through the new hb_interval_ns and next_hb_ns fields, plus the flb_log_metrics_heartbeat() function that emits per-type log counters at configured intervals. While the title could be more specific and contains a typo ("certian" vs "certain"), it accurately captures the core purpose of the changeset.
Out of Scope Changes Check ✅ Passed All code changes in this PR are directly related to implementing interval-based metric updating infrastructure to address the fluentbit_metrics issue. The changes include adding heartbeat-related fields to the flb_log structure, implementing cross-platform monotonic timing, creating new metrics lifecycle APIs, and integrating heartbeat invocations into the logging pipeline. No changes appear to address unrelated concerns or introduce functionality outside the stated objectives of ensuring continuous metric emission.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-update-log-related-metrics-with-intervals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/flb_log.c (1)

667-674: Incomplete error cleanup leads to resource leaks.

If flb_log_metrics_create() fails, the error path at lines 670-673 destroys the event loop and frees the log structure, but doesn't clean up the ch_mng pipe created at line 644. This leaks the pipe file descriptors.

Apply this diff:

     log->metrics = flb_log_metrics_create();
     if (log->metrics == NULL) {
         fprintf(stderr, "[log] could not create log metrics\n");
+        flb_pipe_destroy(log->ch_mng);
         mk_event_loop_destroy(log->evl);
         flb_free(log);
         config->log = NULL;
         return NULL;
     }

Additionally, the error path at lines 694-700 is also incomplete: it should call flb_log_metrics_destroy(log->metrics) before freeing the log structure.

Apply this diff:

     ret = flb_log_worker_init(worker);
     if (ret == -1) {
         flb_errno();
+        flb_log_metrics_destroy(log->metrics);
+        flb_pipe_destroy(log->ch_mng);
         mk_event_loop_destroy(log->evl);
         flb_free(log);
         config->log = NULL;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6345fd1 and 57507e3.

📒 Files selected for processing (2)
  • include/fluent-bit/flb_log.h (1 hunks)
  • src/flb_log.c (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_log.c (1)
lib/cmetrics/src/cmt_counter.c (1)
  • cmt_counter_add (119-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
🔇 Additional comments (4)
src/flb_log.c (4)

544-553: LGTM!

Proper null checks and cleanup order. The function safely handles null inputs and destroys resources in the correct sequence.


559-609: LGTM!

The metrics creation logic is well-structured with proper error handling. Each failure path correctly calls flb_log_metrics_destroy() to clean up partial state. The use of cmt_counter_set() for initialization is appropriate.

Note: The loop at lines 592-606 has the same range limitation as the heartbeat function (excludes HELP and IDEBUG), which is consistent but worth confirming as intentional.


892-892: LGTM!

The heartbeat integration is correctly placed in the log print path. While this adds a small overhead to every log call, the heartbeat function has an early return (line 510 in flb_log_metrics_heartbeat) that minimizes impact when the heartbeat isn't due. The use of wall-clock time (cfl_time_now()) for metric timestamps while the heartbeat uses monotonic time for scheduling is the correct approach.


963-963: LGTM!

The metrics cleanup is properly integrated into the log destruction sequence, ensuring resources are released before the log structure is freed.

Comment on lines +641 to +642
log->hb_interval_ns = 60ULL * 1000000000ULL;
log->next_hb_ns = 0;
Copy link

@coderabbitai coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded heartbeat interval should be configurable.

The heartbeat interval is hardcoded to 60 seconds (line 641), but the PR test configuration uses scrape_interval=60s in the fluentbit_metrics input. If users configure a different scrape interval, the hardcoded heartbeat won't align with the expected metric update frequency.

Consider:

  1. Reading the scrape interval from the fluentbit_metrics input configuration
  2. Making the heartbeat interval configurable via a parameter
  3. Documenting why 60 seconds was chosen if it's intentional to be fixed

Initial heartbeat fires immediately.

Setting next_hb_ns = 0 (line 642) means the first heartbeat will trigger on the first log message, since monotonic_now_ns() returns a positive value. This may be intentional to ensure metrics are emitted early, but consider setting next_hb_ns = monotonic_now_ns() + hb_interval_ns if you want the first heartbeat to fire after one full interval.

🤖 Prompt for AI Agents
In src/flb_log.c around lines 641-642, the heartbeat interval is hardcoded to
60s and next_hb_ns is set to 0; change this so the hb interval is configurable
and the initial heartbeat is scheduled correctly: read the scrape interval from
the fluentbit_metrics input (or expose a new parameter) and use that value (with
a sensible default of 60s) to set log->hb_interval_ns, and set log->next_hb_ns =
monotonic_now_ns() + log->hb_interval_ns so the first heartbeat fires after one
interval (also add a short comment and update docs/config example to note the
new parameter or why 60s is default).

Copy link
Contributor Author

@cosmo0920 cosmo0920 Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a draft proposal so we wouldn't be needed to make configurable for now.
So, we'll choose option 3 for now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmo0920, understood! Since this is a draft proposal, keeping the hardcoded 60-second interval is perfectly reasonable for now. You can make it configurable later if needed once the approach is validated. Thanks for clarifying! 👍

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/flb_log.c (2)

517-521: Use continue instead of break for robustness.

Line 520 uses break when message_type_str is NULL, which stops processing all remaining log types. While NULL is not expected for the current loop range (FLB_LOG_ERROR through FLB_LOG_TRACE are all defined in the switch), using continue would be more defensive and allow processing of subsequent types if the loop range or type definitions change in the future.

Apply this diff:

         if (!message_type_str) {
-            break;
+            continue;
         }

502-528: Consider thread-safety for heartbeat scheduling.

The next_hb_ns field (lines 513, 527) is read and written without synchronization. When flb_log_print() is called concurrently from multiple threads, multiple threads might observe an expired next_hb_ns and all emit heartbeats simultaneously. While cmt_counter_add() is atomic and prevents data corruption, this could result in redundant heartbeat emissions.

For improved accuracy, consider adding a compare-and-swap or mutex to ensure only one thread advances next_hb_ns per interval. However, since duplicate heartbeats are not critical, this optimization can be deferred.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57507e3 and ca9dc71.

📒 Files selected for processing (1)
  • src/flb_log.c (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_log.c (1)
lib/cmetrics/src/cmt_counter.c (1)
  • cmt_counter_add (119-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (4)
src/flb_log.c (4)

466-491: LGTM: Overflow mitigation is sound.

The Windows monotonic time calculation now splits the division to avoid overflow (lines 487-490), addressing the previous review concern. The approach is correct for all practical purposes.

Note: A theoretical overflow still exists if rem * 1000000000ULL exceeds UINT64_MAX (requires performance counter frequency > 18 billion Hz), but this is extremely unlikely in real-world systems.


493-500: LGTM: Standard POSIX monotonic clock implementation.

The non-Windows implementation correctly uses CLOCK_MONOTONIC with clock_gettime, which is the standard approach for obtaining monotonic timestamps.


644-645: LGTM: Heartbeat configuration is appropriate for draft proposal.

The 60-second interval aligns with the test configuration's scrape interval, and setting next_hb_ns = 0 ensures the first heartbeat emits on the first log message, which is sensible for initialization.

As discussed in previous reviews, making the interval configurable can be deferred until the draft is finalized.


895-895: LGTM: Heartbeat invocation placement is correct.

Calling flb_log_metrics_heartbeat() before incrementing the log counter ensures metrics are refreshed at the configured interval, which addresses the PR objective of keeping fluentbit_* metrics alive for prometheus_remote_write.

Note: This adds a monotonic_now_ns() call overhead to every log message, even when the heartbeat interval hasn't elapsed. For most workloads this overhead is negligible, but if profiling reveals performance issues with high-frequency logging, consider sampling or caching the monotonic time check.

@cosmo0920
Copy link
Contributor Author

This could be first aid approach. So, we need to reevaluate to mitigate this type of issue.

@cosmo0920 cosmo0920 closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fluentbit_ metrics stop being sent to prometheus_remote_write output about 1 hour after start

2 participants