Skip to content

Conversation

@pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Jun 27, 2023

Summary

Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS, PSS and shared memory usage of all or some processes on linux.

This plugin is being used to track memory usage and memory leaks.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features
    • Adds a memory metrics input that collects detailed Linux memory stats (RSS, PSS variants, shared/private, swap, hugepages, etc.) with configurable collection interval, proc path, and process filtering by executable, command line, or PID.

@pwhelan pwhelan temporarily deployed to pr June 27, 2023 15:29 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 27, 2023 15:29 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 27, 2023 15:29 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 27, 2023 15:51 — with GitHub Actions Inactive
@pwhelan pwhelan self-assigned this Jun 28, 2023
@pwhelan pwhelan temporarily deployed to pr June 28, 2023 21:00 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 28, 2023 21:00 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 28, 2023 21:00 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 28, 2023 21:21 — with GitHub Actions Inactive
@pwhelan pwhelan force-pushed the pwhelan-in_mem_metrics branch from ca69541 to 5145b57 Compare June 29, 2023 15:17
@pwhelan pwhelan temporarily deployed to pr June 29, 2023 15:18 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 29, 2023 15:18 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr June 29, 2023 15:18 — with GitHub Actions Inactive
@pwhelan pwhelan requested a review from patrick-stephens June 29, 2023 15:29
@pwhelan pwhelan temporarily deployed to pr June 29, 2023 15:43 — with GitHub Actions Inactive
@pwhelan pwhelan force-pushed the pwhelan-in_mem_metrics branch from 5145b57 to e5f647f Compare July 26, 2023 14:14
@pwhelan pwhelan temporarily deployed to pr July 26, 2023 14:15 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 26, 2023 14:15 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 26, 2023 14:15 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 26, 2023 14:43 — with GitHub Actions Inactive
pwhelan added a commit to fluent/fluent-bit-docs that referenced this pull request Jul 26, 2023
This adds documentation for the new `in_mem_metrics` plugin: fluent/fluent-bit#7615.

Signed-off-by: Phillip Whelan <[email protected]>
@samos667
Copy link

samos667 commented Aug 18, 2025

process-exporter have been (mostly) rewrite and implemented since but sadly lack of smaps reading for accurate memory usage report.

process-exporter(go) support it. Does it's planned to be supported inside Fluent Bit?

Edit:
I'll bring this discussion here

Although why would you use it over node exporter, kubelet metrics, etc. is probably the main question I'd have?

Not every system is build around kubernetes, podman or docker. And as far I known, node-exporter is not capable of exporting (by) process statistics. There is nothing more universal than /proc stat, why would you not use it (I mean, at least support it) over anything else, is probably the main response I can give.

@eschabell
Copy link

@pwhelan would you mind fixing the conflicts here?

@pwhelan
Copy link
Contributor Author

pwhelan commented Oct 22, 2025

@pwhelan would you mind fixing the conflicts here?

Yeah, I'll take a look today and make sure it's all working.

*** edit ***

@eschabell

Done.

… linux.

Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS,
PSS and shared memory usage of all or some processes on linux.

Signed-off-by: Phillip Whelan <[email protected]>
Allow using comma delimited values for the PID, exec(utable) and cmdline
filters to allow for multiple values. All these filters are non-exclusive
so a process that matches any one of them is chosen to be polled for metrics.

Signed-off-by: Phillip Whelan <[email protected]>
@pwhelan pwhelan force-pushed the pwhelan-in_mem_metrics branch from e5f647f to 27d1306 Compare October 22, 2025 23:15
@coderabbitai
Copy link

coderabbitai bot commented Oct 22, 2025

Walkthrough

Adds a new Linux input plugin "mem_metrics" that reads process memory stats from procfs (smaps_rollup), exposes many cmt gauges, and introduces CMake build flags to enable/disable the plugin across platforms.

Changes

Cohort / File(s) Summary
Build top-level & options
CMakeLists.txt, cmake/plugins_options.cmake
Added public build option FLB_IN_MEM_METRICS and enable it when FLB_ALL is set.
Windows setup
cmake/windows-setup.cmake
Introduced FLB_IN_MEM_METRICS variable set to No in Windows input plugin defaults.
Plugin registration
plugins/CMakeLists.txt
Registered the Linux-only input plugin with REGISTER_IN_PLUGIN("in_mem_metrics").
Plugin build
plugins/in_mem_metrics/CMakeLists.txt
New plugin CMake target invoking FLB_PLUGIN for in_mem_metrics with mem_metrics.c.
Plugin implementation
plugins/in_mem_metrics/mem_metrics.c
New input plugin implementing process selection (exec/cmd/pid), procfs scanning, smaps_rollup parsing, and many cmt gauges; exposes in_mem_metrics_plugin input plugin symbol.

Sequence Diagram(s)

sequenceDiagram
    participant Core as Fluent Bit Core
    participant Plugin as mem_metrics
    participant Proc as /proc

    Core->>Plugin: cb_mem_metrics_init()
    activate Plugin
    Plugin->>Plugin: create context & CMT gauges
    Plugin->>Plugin: register collector (timer)
    deactivate Plugin

    loop every interval (default 5s)
        Core->>Plugin: cb_collector_time()
        activate Plugin
        alt filter targets configured
            Plugin->>Proc: read /proc/self/smaps_rollup (or specific pid)
        else full scan
            Plugin->>Proc: glob /proc/[0-9]*
            loop per pid
                Plugin->>Proc: read /proc/[pid]/smaps_rollup
                Plugin->>Plugin: parse values -> bytes
                Plugin->>Plugin: update CMT gauges (tag: pid/type)
            end
        end
        Plugin->>Core: append collected metrics
        deactivate Plugin
    end

    Core->>Plugin: cb_mem_metrics_exit()
    activate Plugin
    Plugin->>Plugin: cleanup CMT & context
    deactivate Plugin
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • leonardo-albertovich
  • edsiper
  • niedbalski
  • patrick-stephens
  • koleini
  • fujimotos
  • celalettin1286

Poem

🐰 I hop through /proc where tiny numbers hide,
I parse each smaps_rollup with whiskers wide,
RSS and PSS I tuck in a cmt nest,
Tagged by pid, then I pause to rest.
Metrics hop out — a rabbit's memory quest! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for linux." accurately and clearly describes the main change in the changeset. The title correctly identifies the plugin name (in_mem_metrics), its primary function (tracking RSS and PSS memory metrics), and the target platform (Linux). This aligns well with the substantial additions shown in the raw summary, particularly the new plugin implementation in mem_metrics.c and the associated build configuration changes. The title is specific, concise, and avoids vague or overly broad language, making it clear to anyone reviewing the repository history what the primary purpose of this pull request is.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pwhelan-in_mem_metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
plugins/in_mem_metrics/mem_metrics.c (3)

261-263: Avoid truncating smaps_rollup parsing with a hard split limit.

Using max_split=21 risks truncation if kernels add lines. Prefer unlimited or a safe upper bound.

-    lines = flb_utils_split(buf, '\n', 21);
+    lines = flb_utils_split(buf, '\n', 0); /* 0 => no limit (unbounded) */

If 0 isn’t supported by split(), use a generous bound (e.g., 64).


370-383: Stat/access the procfs root, not the glob pattern, on GLOB_NOMATCH.

stat("/proc/[0-9]*") is invalid. Check ctx->procfs_path for existence/permissions to produce clearer diagnostics.

-            ret = stat(real_path, &st);
+            ret = stat(ctx->procfs_path, &st);
             if (ret == -1) {
-                flb_plg_debug(ctx->ins, "cannot read info from: %s", real_path);
+                flb_plg_debug(ctx->ins, "cannot read info from: %s", ctx->procfs_path);
             }
             else {
-                ret = access(real_path, R_OK);
+                ret = access(ctx->procfs_path, R_OK);
                 if (ret == -1 && errno == EACCES) {
-                    flb_plg_error(ctx->ins, "NO read access for path: %s", real_path);
+                    flb_plg_error(ctx->ins, "no read access for path: %s", ctx->procfs_path);
                 }
                 else {
-                    flb_plg_debug(ctx->ins, "NO matches for path: %s", real_path);
+                    flb_plg_debug(ctx->ins, "no matches for pattern under: %s", ctx->procfs_path);
                 }
             }

511-524: Update config descriptions: values support comma-separated lists.

Descriptions still say “single”; code and CLIST_1 support lists (incl. “self”/“0” for PID). Align help text.

-     "Filter for a single executable"
+     "Filter by executable path (comma-separated list supported)"
@@
-     "Filter by the command line"
+     "Filter by full command line (space-joined argv; comma-separated list supported)"
@@
-      "Filter by PID"
+      "Filter by PID (supports comma-separated list and special values: self, 0)"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6345fd1 and 27d1306.

📒 Files selected for processing (6)
  • CMakeLists.txt (1 hunks)
  • cmake/plugins_options.cmake (1 hunks)
  • cmake/windows-setup.cmake (1 hunks)
  • plugins/CMakeLists.txt (1 hunks)
  • plugins/in_mem_metrics/CMakeLists.txt (1 hunks)
  • plugins/in_mem_metrics/mem_metrics.c (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
PR: fluent/fluent-bit#9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • CMakeLists.txt
  • cmake/windows-setup.cmake
🧬 Code graph analysis (1)
plugins/in_mem_metrics/mem_metrics.c (7)
src/flb_utils.c (2)
  • flb_utils_split (464-467)
  • flb_utils_split_free (477-489)
lib/cmetrics/src/cmt_gauge.c (1)
  • cmt_gauge_set (94-109)
plugins/in_blob/win32_glob.c (2)
  • glob (119-156)
  • globfree (107-117)
src/flb_input_metric.c (1)
  • flb_input_metrics_append (102-110)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
include/fluent-bit/flb_input.h (1)
  • flb_input_config_map_set (715-741)
src/flb_input.c (2)
  • flb_input_set_context (1610-1613)
  • flb_input_set_collector_time (1685-1704)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (6)
cmake/plugins_options.cmake (1)

39-39: LGTM: option defined and aligned with Linux-only registration.
No action needed.

cmake/windows-setup.cmake (1)

57-57: LGTM: disabled by default on Windows.

CMakeLists.txt (1)

276-283: LGTM: enabled under FLB_ALL.
No further tweaks needed.

plugins/in_mem_metrics/CMakeLists.txt (1)

1-4: LGTM: plugin target definition is correct.

plugins/CMakeLists.txt (1)

251-267: LGTM: Linux-only registration added in the right block.

plugins/in_mem_metrics/mem_metrics.c (1)

408-419: Review comment is accurate. The unbounded metric series growth issue is real and the proposed solution is sound.

I've verified the core claims:

  1. Persistent reuse of ctx→cmt: Confirmed—created once at initialization (line 458) and never recreated across collection ticks.

  2. Metric series accumulate permanently: Confirmed—each scrape calls cmt_gauge_set() with dynamic pid labels, and every unique pid becomes a permanent series in the cmt object.

  3. cmetrics lacks metric deletion: Confirmed—no cmt_gauge_remove, cmt_*_reset, or prune functions exposed. Only cmt_counter_allow_reset() exists for counters; gauges have no equivalent deletion mechanism.

  4. Proposed solution is safe: The pattern of recreating cmt per scrape is valid:

    • Gauge pointers in ctx are reassigned to new gauge objects from the new cmt
    • Macros can be called multiple times to re-initialize gauges
    • Old cmt is properly destroyed
    • This pattern has precedent in the codebase (processor_labels and processor_metrics_selector both create fresh cmt objects per operation)

The review comment correctly identifies a real cardinality leak and proposes a practical workaround. The suggested diff pattern aligns with how other components handle cmetrics lifecycle management in this codebase.

Comment on lines +278 to +283
if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) {
cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
}
if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) {
cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" });
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

PSS “base” type label is misnamed “clean”.

For the aggregate “Pss:” line, label should reflect “base” (or “total”), not “clean”. Current label collides semantically with “Pss_Dirty”.

-            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
+            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });

Also update the comment at Line 51 to match (“types=base,dirty,anon,file,shmem”).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) {
cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
}
if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) {
cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" });
}
if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) {
cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });
}
if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) {
cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" });
}
🤖 Prompt for AI Agents
In plugins/in_mem_metrics/mem_metrics.c around lines 278 to 283, the PSS
aggregate label for the "Pss:" line is incorrectly using "clean" which collides
with "Pss_Dirty:"; change the label from "clean" to "base" (or "total" as
preferred) so the two types read e.g. "base" and "dirty", and also update the
comment at line 51 to reflect the correct label set
("types=base,dirty,anon,file,shmem"). Ensure the cmt_gauge_set call for "Pss:"
uses the new label string and the comment exactly matches the updated types
list.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
plugins/in_mem_metrics/mem_metrics.c (3)

92-101: readlink handling LGTM.

Properly checks return and NUL-terminates.


293-307: PSS base label should not be “clean”.

“Pss:” is the aggregate; label it “base” (or “total”) to avoid collision with “dirty” and to match the comment.

-            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
+            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });

126-146: Fix rc check and clarify cmdline filter semantics.

Two issues in is_chosen_cmd():

  1. rc check bug (line 135): rc <= -1 is dead code; read() never returns ≤ -1. Use rc <= 0 to catch errors and empty reads.

  2. cmdline semantics mismatch: Line 112 comment says "actual command, not any of the arguments" (argv[0] only), but line 538 config help says "Filter by the command line" (implying full cmdline). Code matches only argv[0] because /proc/<pid>/cmdline uses NUL-separators and strcmp() stops at the first NUL. Either document as argv[0]-only or normalize NULs to spaces for full-line matching. Choose one and keep docs consistent.

-    if (rc <= -1) {
+    if (rc <= 0) {
         return FLB_FALSE;
     }
🧹 Nitpick comments (2)
plugins/in_mem_metrics/mem_metrics.c (2)

423-433: Use a single timestamp per scrape.

Ensures internal metric points share one ts.

 static int cb_collector_time(struct flb_input_instance *ins,
                              struct flb_config *config, void *in_context)
 {
     struct mem_metrics *ctx = (struct mem_metrics *)in_context;
-    if (is_chosen_pid_self(ctx)) {
-        mmtx_parse_proc(ctx, cfl_time_now(), "/proc/self");
-    } else {
-    	mmtx_utils_path_scan_procs(ctx, cfl_time_now());
-    }
+    uint64_t ts = cfl_time_now();
+    if (is_chosen_pid_self(ctx)) {
+        mmtx_parse_proc(ctx, ts, "/proc/self");
+    } else {
+        mmtx_utils_path_scan_procs(ctx, ts);
+    }
     flb_input_metrics_append(ins, NULL, 0, ctx->cmt);
     FLB_INPUT_RETURN(0);
 }

532-544: Config help strings imply single value; code supports lists and special values.

Please update help to reflect comma‑delimited lists and “self,0”.

-     "Filter for a single executable"
+     "Comma-delimited allow list of executable paths (matches /proc/<pid>/exe)"
@@
-     "Filter by the command line"
+     "Comma-delimited allow list; match command (argv[0]) or full cmdline (see docs)"
@@
-      "Filter by PID"
+      "Comma-delimited allow list of PIDs; special: self, 0"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27d1306 and 19ecddc.

📒 Files selected for processing (1)
  • plugins/in_mem_metrics/mem_metrics.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_mem_metrics/mem_metrics.c (7)
src/flb_utils.c (2)
  • flb_utils_split (464-467)
  • flb_utils_split_free (477-489)
lib/cmetrics/src/cmt_gauge.c (1)
  • cmt_gauge_set (94-109)
plugins/in_blob/win32_glob.c (2)
  • glob (119-156)
  • globfree (107-117)
src/flb_input_metric.c (1)
  • flb_input_metrics_append (102-110)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
include/fluent-bit/flb_input.h (1)
  • flb_input_config_map_set (715-741)
src/flb_input.c (2)
  • flb_input_set_context (1610-1613)
  • flb_input_set_collector_time (1685-1704)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test

Comment on lines +560 to +572
struct flb_input_plugin in_mem_metrics_plugin = {
.name = "mem_metrics",
.description = "Full Memory Metrics for Linux",
.cb_init = cb_mem_metrics_init,
.cb_pre_run = NULL,
.cb_collect = NULL,
.cb_flush_buf = NULL,
.cb_pause = NULL,
.cb_resume = NULL,
.cb_exit = cb_mem_metrics_exit,
.config_map = config_map,
.flags = FLB_INPUT_CORO | FLB_INPUT_THREADED
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Operational: unbounded PID label cardinality can grow memory over time.

Each new pid creates a new time series and persists beyond process exit. Long‑running agents with churn (e.g., k8s nodes) will grow cmetrics state.

Mitigations:

  • Prune metrics for pids not seen this scrape (if cmetrics supports deletion).
  • Or keep a short TTL for unseen pids, then delete.
  • Or snapshot per collect: build a temporary cmt each cycle, append, then destroy (costlier but bounded).

Please check whether cmetrics exposes deletion/clear APIs:


🏁 Script executed:

#!/bin/bash
# Search for deletion APIs in cmetrics
rg -nP 'cmt_.*(delete|remove|clear|unset)' lib/ src/ plugins/ -C2
rg -nP 'cmt_map_metric_get\(|cmt_metric_set\(' lib/cmetrics -C3

Length of output: 16268


I need to dig deeper into the cmetrics public API to verify whether deletion/cleanup mechanisms exist beyond destroying entire contexts.

Let me search the codebase directly for the complete cmt_map header and any related APIs for metric deletion or cleanup.





---

🏁 Script executed:

```shell
#!/bin/bash
# Extract full cmt_map.h header to see all public functions
cat lib/cmetrics/include/cmetrics/cmt_map.h

Length of output: 2285


🏁 Script executed:

#!/bin/bash
# Search for all functions that operate on cmt_map or perform cleanup
rg -nP 'cmt.*\(' lib/cmetrics/include/cmetrics/cmt_map.h -A2

Length of output: 719


🏁 Script executed:

#!/bin/bash
# Check if mem_metrics plugin actually uses cmetrics
rg -nP 'cmt_|cmetrics' plugins/in_mem_metrics/ -C2

Length of output: 10195


Let me verify whether the mem_metrics plugin implements any cleanup logic in its collect function:





🏁 Script executed:

#!/bin/bash
# Find the collect and pre_run callback implementations
rg -nP 'cb_mem_metrics_collect|cb_mem_metrics_pre_run' plugins/in_mem_metrics/ -A30

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's any tracking of previous PIDs or cleanup logic
rg -nP 'prev|last|old|stale|clear|reset' plugins/in_mem_metrics/mem_metrics.c -B2 -A2

Length of output: 87



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for the main function that scans /proc
rg -nP 'status|proc|collect_mem|/proc' plugins/in_mem_metrics/mem_metrics.c -B3 -A3

Length of output: 3962


Address unbounded PID label cardinality by pruning stale metrics.

The concern is valid: each new pid creates a new time series and persists beyond process exit. The plugin calls cmt_gauge_set() with pid as a label for each collection cycle but never removes metrics for processes that have exited. In environments with process churn (e.g., Kubernetes nodes with frequent pod restarts), this causes unbounded cardinality growth.

cmetrics exposes cmt_map_metric_destroy() in its public API (lib/cmetrics/include/cmetrics/cmt_map.h), so deletion is feasible. The recommended approach:

Track PIDs seen during the current scan and call cmt_map_metric_destroy() for any metric whose PID is no longer present in /proc. This requires either maintaining a previous PID set or iterating cmetrics' internal metric list to identify stale entries.

Alternatively, destroy and recreate the cmt context per collection cycle (cmt_destroy() followed by cmt_create()), which bounds memory at the cost of higher CPU per scrape.

@pwhelan
Copy link
Contributor Author

pwhelan commented Nov 4, 2025

@eschabell I have been working on a new function for cmetrics that can expire labels that have not been updated to fix the potential memory leak when processes are created and then later stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants