in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for linux. #7615

pwhelan · 2023-06-27T15:28:40Z

Summary

Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS, PSS and shared memory usage of all or some processes on linux.

This plugin is being used to track memory usage and memory leaks.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

New Features
- Adds a memory metrics input that collects detailed Linux memory stats (RSS, PSS variants, shared/private, swap, hugepages, etc.) with configurable collection interval, proc path, and process filtering by executable, command line, or PID.

plugins/in_mem_metrics/mem_metrics.c

This adds documentation for the new `in_mem_metrics` plugin: fluent/fluent-bit#7615. Signed-off-by: Phillip Whelan <[email protected]>

samos667 · 2025-08-18T22:30:21Z

process-exporter have been (mostly) rewrite and implemented since but sadly lack of smaps reading for accurate memory usage report.

process-exporter(go) support it. Does it's planned to be supported inside Fluent Bit?

Edit:
I'll bring this discussion here

Although why would you use it over node exporter, kubelet metrics, etc. is probably the main question I'd have?

Not every system is build around kubernetes, podman or docker. And as far I known, node-exporter is not capable of exporting (by) process statistics. There is nothing more universal than /proc stat, why would you not use it (I mean, at least support it) over anything else, is probably the main response I can give.

eschabell · 2025-10-22T18:59:51Z

@pwhelan would you mind fixing the conflicts here?

pwhelan · 2025-10-22T19:01:07Z

@pwhelan would you mind fixing the conflicts here?

Yeah, I'll take a look today and make sure it's all working.

*** edit ***

@eschabell

Done.

… linux. Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS, PSS and shared memory usage of all or some processes on linux. Signed-off-by: Phillip Whelan <[email protected]>

Allow using comma delimited values for the PID, exec(utable) and cmdline filters to allow for multiple values. All these filters are non-exclusive so a process that matches any one of them is chosen to be polled for metrics. Signed-off-by: Phillip Whelan <[email protected]>

coderabbitai · 2025-10-22T23:15:31Z

Walkthrough

Adds a new Linux input plugin "mem_metrics" that reads process memory stats from procfs (smaps_rollup), exposes many cmt gauges, and introduces CMake build flags to enable/disable the plugin across platforms.

Changes

Cohort / File(s)	Summary
Build top-level & options `CMakeLists.txt`, `cmake/plugins_options.cmake`	Added public build option `FLB_IN_MEM_METRICS` and enable it when `FLB_ALL` is set.
Windows setup `cmake/windows-setup.cmake`	Introduced `FLB_IN_MEM_METRICS` variable set to `No` in Windows input plugin defaults.
Plugin registration `plugins/CMakeLists.txt`	Registered the Linux-only input plugin with `REGISTER_IN_PLUGIN("in_mem_metrics")`.
Plugin build `plugins/in_mem_metrics/CMakeLists.txt`	New plugin CMake target invoking `FLB_PLUGIN` for `in_mem_metrics` with `mem_metrics.c`.
Plugin implementation `plugins/in_mem_metrics/mem_metrics.c`	New input plugin implementing process selection (exec/cmd/pid), procfs scanning, smaps_rollup parsing, and many cmt gauges; exposes `in_mem_metrics_plugin` input plugin symbol.

Sequence Diagram(s)

sequenceDiagram
    participant Core as Fluent Bit Core
    participant Plugin as mem_metrics
    participant Proc as /proc

    Core->>Plugin: cb_mem_metrics_init()
    activate Plugin
    Plugin->>Plugin: create context & CMT gauges
    Plugin->>Plugin: register collector (timer)
    deactivate Plugin

    loop every interval (default 5s)
        Core->>Plugin: cb_collector_time()
        activate Plugin
        alt filter targets configured
            Plugin->>Proc: read /proc/self/smaps_rollup (or specific pid)
        else full scan
            Plugin->>Proc: glob /proc/[0-9]*
            loop per pid
                Plugin->>Proc: read /proc/[pid]/smaps_rollup
                Plugin->>Plugin: parse values -> bytes
                Plugin->>Plugin: update CMT gauges (tag: pid/type)
            end
        end
        Plugin->>Core: append collected metrics
        deactivate Plugin
    end

    Core->>Plugin: cb_mem_metrics_exit()
    activate Plugin
    Plugin->>Plugin: cleanup CMT & context
    deactivate Plugin

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

leonardo-albertovich
edsiper
niedbalski
patrick-stephens
koleini
fujimotos
celalettin1286

Poem

🐰 I hop through /proc where tiny numbers hide,
I parse each smaps_rollup with whiskers wide,
RSS and PSS I tuck in a cmt nest,
Tagged by pid, then I pause to rest.
Metrics hop out — a rabbit's memory quest! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for linux." accurately and clearly describes the main change in the changeset. The title correctly identifies the plugin name (in_mem_metrics), its primary function (tracking RSS and PSS memory metrics), and the target platform (Linux). This aligns well with the substantial additions shown in the raw summary, particularly the new plugin implementation in mem_metrics.c and the associated build configuration changes. The title is specific, concise, and avoids vague or overly broad language, making it clear to anyone reviewing the repository history what the primary purpose of this pull request is.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pwhelan-in_mem_metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

plugins/in_mem_metrics/mem_metrics.c (3)

261-263: Avoid truncating smaps_rollup parsing with a hard split limit.

Using max_split=21 risks truncation if kernels add lines. Prefer unlimited or a safe upper bound.

-    lines = flb_utils_split(buf, '\n', 21);
+    lines = flb_utils_split(buf, '\n', 0); /* 0 => no limit (unbounded) */

If 0 isn’t supported by split(), use a generous bound (e.g., 64).

370-383: Stat/access the procfs root, not the glob pattern, on GLOB_NOMATCH.

stat("/proc/[0-9]*") is invalid. Check ctx->procfs_path for existence/permissions to produce clearer diagnostics.

-            ret = stat(real_path, &st);
+            ret = stat(ctx->procfs_path, &st);
             if (ret == -1) {
-                flb_plg_debug(ctx->ins, "cannot read info from: %s", real_path);
+                flb_plg_debug(ctx->ins, "cannot read info from: %s", ctx->procfs_path);
             }
             else {
-                ret = access(real_path, R_OK);
+                ret = access(ctx->procfs_path, R_OK);
                 if (ret == -1 && errno == EACCES) {
-                    flb_plg_error(ctx->ins, "NO read access for path: %s", real_path);
+                    flb_plg_error(ctx->ins, "no read access for path: %s", ctx->procfs_path);
                 }
                 else {
-                    flb_plg_debug(ctx->ins, "NO matches for path: %s", real_path);
+                    flb_plg_debug(ctx->ins, "no matches for pattern under: %s", ctx->procfs_path);
                 }
             }

511-524: Update config descriptions: values support comma-separated lists.

Descriptions still say “single”; code and CLIST_1 support lists (incl. “self”/“0” for PID). Align help text.

-     "Filter for a single executable"
+     "Filter by executable path (comma-separated list supported)"
@@
-     "Filter by the command line"
+     "Filter by full command line (space-joined argv; comma-separated list supported)"
@@
-      "Filter by PID"
+      "Filter by PID (supports comma-separated list and special values: self, 0)"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6345fd1 and 27d1306.

📒 Files selected for processing (6)

CMakeLists.txt (1 hunks)
cmake/plugins_options.cmake (1 hunks)
cmake/windows-setup.cmake (1 hunks)
plugins/CMakeLists.txt (1 hunks)
plugins/in_mem_metrics/CMakeLists.txt (1 hunks)
plugins/in_mem_metrics/mem_metrics.c (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-31T12:46:11.940Z

Learnt from: ThomasDevoogdt
PR: fluent/fluent-bit#9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

CMakeLists.txt
cmake/windows-setup.cmake

🧬 Code graph analysis (1)

plugins/in_mem_metrics/mem_metrics.c (7)

src/flb_utils.c (2)

flb_utils_split (464-467)

flb_utils_split_free (477-489)

lib/cmetrics/src/cmt_gauge.c (1)

cmt_gauge_set (94-109)

plugins/in_blob/win32_glob.c (2)

glob (119-156)

globfree (107-117)

src/flb_input_metric.c (1)

flb_input_metrics_append (102-110)

include/fluent-bit/flb_mem.h (2)

flb_calloc (84-96)

flb_free (126-128)

include/fluent-bit/flb_input.h (1)

flb_input_config_map_set (715-741)

src/flb_input.c (2)

flb_input_set_context (1610-1613)

flb_input_set_collector_time (1685-1704)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-without-cxx (3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-centos-7
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: PR - fuzzing test

🔇 Additional comments (6)

cmake/plugins_options.cmake (1)

39-39: LGTM: option defined and aligned with Linux-only registration.
No action needed.

cmake/windows-setup.cmake (1)

57-57: LGTM: disabled by default on Windows.

CMakeLists.txt (1)

276-283: LGTM: enabled under FLB_ALL.
No further tweaks needed.

plugins/in_mem_metrics/CMakeLists.txt (1)

1-4: LGTM: plugin target definition is correct.

plugins/CMakeLists.txt (1)

251-267: LGTM: Linux-only registration added in the right block.

plugins/in_mem_metrics/mem_metrics.c (1)

408-419: Review comment is accurate. The unbounded metric series growth issue is real and the proposed solution is sound.

I've verified the core claims:

Persistent reuse of ctx→cmt: Confirmed—created once at initialization (line 458) and never recreated across collection ticks.

Metric series accumulate permanently: Confirmed—each scrape calls cmt_gauge_set() with dynamic pid labels, and every unique pid becomes a permanent series in the cmt object.

cmetrics lacks metric deletion: Confirmed—no cmt_gauge_remove, cmt_*_reset, or prune functions exposed. Only cmt_counter_allow_reset() exists for counters; gauges have no equivalent deletion mechanism.

Proposed solution is safe: The pattern of recreating cmt per scrape is valid:

Gauge pointers in ctx are reassigned to new gauge objects from the new cmt

Macros can be called multiple times to re-initialize gauges

Old cmt is properly destroyed

This pattern has precedent in the codebase (processor_labels and processor_metrics_selector both create fresh cmt objects per operation)

The review comment correctly identifies a real cardinality leak and proposes a practical workaround. The suggested diff pattern aligns with how other components handle cmetrics lifecycle management in this codebase.

plugins/in_mem_metrics/mem_metrics.c

coderabbitai · 2025-10-22T23:22:35Z

plugins/in_mem_metrics/mem_metrics.c

+        if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) {
+            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
+        }
+        if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) {
+            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" });
+        }


⚠️ Potential issue | 🟡 Minor

PSS “base” type label is misnamed “clean”.

For the aggregate “Pss:” line, label should reflect “base” (or “total”), not “clean”. Current label collides semantically with “Pss_Dirty”.

- cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" }); + cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });

Also update the comment at Line 51 to match (“types=base,dirty,anon,file,shmem”).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) {

cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });

}

if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) {

cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" });

}

if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) {

cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });

}

if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) {

cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" });

}

🤖 Prompt for AI Agents

In plugins/in_mem_metrics/mem_metrics.c around lines 278 to 283, the PSS aggregate label for the "Pss:" line is incorrectly using "clean" which collides with "Pss_Dirty:"; change the label from "clean" to "base" (or "total" as preferred) so the two types read e.g. "base" and "dirty", and also update the comment at line 51 to reflect the correct label set ("types=base,dirty,anon,file,shmem"). Ensure the cmt_gauge_set call for "Pss:" uses the new label string and the comment exactly matches the updated types list.

plugins/in_mem_metrics/mem_metrics.c

Signed-off-by: Phillip Whelan <[email protected]>

…reate. Signed-off-by: Phillip Whelan <[email protected]>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (3)

plugins/in_mem_metrics/mem_metrics.c (3)
92-101: readlink handling LGTM.

Properly checks return and NUL-terminates.

293-307: PSS base label should not be “clean”.

“Pss:” is the aggregate; label it “base” (or “total”) to avoid collision with “dirty” and to match the comment.
-            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
+            cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });
126-146: Fix rc check and clarify cmdline filter semantics.

Two issues in is_chosen_cmd():

rc check bug (line 135): rc <= -1 is dead code; read() never returns ≤ -1. Use rc <= 0 to catch errors and empty reads.

cmdline semantics mismatch: Line 112 comment says "actual command, not any of the arguments" (argv[0] only), but line 538 config help says "Filter by the command line" (implying full cmdline). Code matches only argv[0] because /proc/<pid>/cmdline uses NUL-separators and strcmp() stops at the first NUL. Either document as argv[0]-only or normalize NULs to spaces for full-line matching. Choose one and keep docs consistent.
-    if (rc <= -1) {
+    if (rc <= 0) {
         return FLB_FALSE;
     }

🧹 Nitpick comments (2)

plugins/in_mem_metrics/mem_metrics.c (2)

423-433: Use a single timestamp per scrape.

Ensures internal metric points share one ts.

 static int cb_collector_time(struct flb_input_instance *ins,
                              struct flb_config *config, void *in_context)
 {
     struct mem_metrics *ctx = (struct mem_metrics *)in_context;
-    if (is_chosen_pid_self(ctx)) {
-        mmtx_parse_proc(ctx, cfl_time_now(), "/proc/self");
-    } else {
-    	mmtx_utils_path_scan_procs(ctx, cfl_time_now());
-    }
+    uint64_t ts = cfl_time_now();
+    if (is_chosen_pid_self(ctx)) {
+        mmtx_parse_proc(ctx, ts, "/proc/self");
+    } else {
+        mmtx_utils_path_scan_procs(ctx, ts);
+    }
     flb_input_metrics_append(ins, NULL, 0, ctx->cmt);
     FLB_INPUT_RETURN(0);
 }

532-544: Config help strings imply single value; code supports lists and special values.

Please update help to reflect comma‑delimited lists and “self,0”.

-     "Filter for a single executable"
+     "Comma-delimited allow list of executable paths (matches /proc/<pid>/exe)"
@@
-     "Filter by the command line"
+     "Comma-delimited allow list; match command (argv[0]) or full cmdline (see docs)"
@@
-      "Filter by PID"
+      "Comma-delimited allow list of PIDs; special: self, 0"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27d1306 and 19ecddc.

📒 Files selected for processing (1)

plugins/in_mem_metrics/mem_metrics.c (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

plugins/in_mem_metrics/mem_metrics.c (7)

src/flb_utils.c (2)

flb_utils_split (464-467)

flb_utils_split_free (477-489)

lib/cmetrics/src/cmt_gauge.c (1)

cmt_gauge_set (94-109)

plugins/in_blob/win32_glob.c (2)

glob (119-156)

globfree (107-117)

src/flb_input_metric.c (1)

flb_input_metrics_append (102-110)

include/fluent-bit/flb_mem.h (2)

flb_calloc (84-96)

flb_free (126-128)

include/fluent-bit/flb_input.h (1)

flb_input_config_map_set (715-741)

src/flb_input.c (2)

flb_input_set_context (1610-1613)

flb_input_set_collector_time (1685-1704)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-centos-7
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: pr-compile-without-cxx (3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: PR - fuzzing test

plugins/in_mem_metrics/mem_metrics.c

coderabbitai · 2025-10-22T23:48:25Z

plugins/in_mem_metrics/mem_metrics.c

+struct flb_input_plugin in_mem_metrics_plugin = {
+    .name         = "mem_metrics",
+    .description  = "Full Memory Metrics for Linux",
+    .cb_init      = cb_mem_metrics_init,
+    .cb_pre_run   = NULL,
+    .cb_collect   = NULL,
+    .cb_flush_buf = NULL,
+    .cb_pause     = NULL,
+    .cb_resume    = NULL,
+    .cb_exit      = cb_mem_metrics_exit,
+    .config_map   = config_map,
+    .flags        = FLB_INPUT_CORO | FLB_INPUT_THREADED
+};


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Operational: unbounded PID label cardinality can grow memory over time.

Each new pid creates a new time series and persists beyond process exit. Long‑running agents with churn (e.g., k8s nodes) will grow cmetrics state.

Mitigations:

Prune metrics for pids not seen this scrape (if cmetrics supports deletion).

Or keep a short TTL for unseen pids, then delete.

Or snapshot per collect: build a temporary cmt each cycle, append, then destroy (costlier but bounded).

Please check whether cmetrics exposes deletion/clear APIs:

🏁 Script executed:

#!/bin/bash # Search for deletion APIs in cmetrics rg -nP 'cmt_.*(delete|remove|clear|unset)' lib/ src/ plugins/ -C2 rg -nP 'cmt_map_metric_get\(|cmt_metric_set\(' lib/cmetrics -C3

Length of output: 16268

I need to dig deeper into the cmetrics public API to verify whether deletion/cleanup mechanisms exist beyond destroying entire contexts.

Let me search the codebase directly for the complete cmt_map header and any related APIs for metric deletion or cleanup.

--- 🏁 Script executed: ```shell #!/bin/bash # Extract full cmt_map.h header to see all public functions cat lib/cmetrics/include/cmetrics/cmt_map.h

Length of output: 2285

🏁 Script executed:

#!/bin/bash # Search for all functions that operate on cmt_map or perform cleanup rg -nP 'cmt.*\(' lib/cmetrics/include/cmetrics/cmt_map.h -A2

Length of output: 719

🏁 Script executed:

#!/bin/bash # Check if mem_metrics plugin actually uses cmetrics rg -nP 'cmt_|cmetrics' plugins/in_mem_metrics/ -C2

Length of output: 10195

Let me verify whether the mem_metrics plugin implements any cleanup logic in its collect function:

🏁 Script executed:

#!/bin/bash # Find the collect and pre_run callback implementations rg -nP 'cb_mem_metrics_collect|cb_mem_metrics_pre_run' plugins/in_mem_metrics/ -A30

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if there's any tracking of previous PIDs or cleanup logic rg -nP 'prev|last|old|stale|clear|reset' plugins/in_mem_metrics/mem_metrics.c -B2 -A2

Length of output: 87

--- 🏁 Script executed: ```shell #!/bin/bash # Search for the main function that scans /proc rg -nP 'status|proc|collect_mem|/proc' plugins/in_mem_metrics/mem_metrics.c -B3 -A3

Length of output: 3962

Address unbounded PID label cardinality by pruning stale metrics.

The concern is valid: each new pid creates a new time series and persists beyond process exit. The plugin calls cmt_gauge_set() with pid as a label for each collection cycle but never removes metrics for processes that have exited. In environments with process churn (e.g., Kubernetes nodes with frequent pod restarts), this causes unbounded cardinality growth.

cmetrics exposes cmt_map_metric_destroy() in its public API (lib/cmetrics/include/cmetrics/cmt_map.h), so deletion is feasible. The recommended approach:

Track PIDs seen during the current scan and call cmt_map_metric_destroy() for any metric whose PID is no longer present in /proc. This requires either maintaining a previous PID set or iterating cmetrics' internal metric list to identify stale entries.

Alternatively, destroy and recreate the cmt context per collection cycle (cmt_destroy() followed by cmt_create()), which bounds memory at the cost of higher CPU per scrape.

pwhelan · 2025-11-04T18:35:56Z

@eschabell I have been working on a new function for cmetrics that can expire labels that have not been updated to fix the potential memory leak when processes are created and then later stopped.

pwhelan requested review from edsiper, fujimotos, koleini, leonardo-albertovich, niedbalski and patrick-stephens as code owners June 27, 2023 15:28

github-actions bot added the docs-required label Jun 27, 2023

pwhelan temporarily deployed to pr June 27, 2023 15:29 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr June 27, 2023 15:51 — with GitHub Actions Inactive

patrick-stephens reviewed Jun 28, 2023

View reviewed changes

plugins/in_mem_metrics/mem_metrics.c Outdated Show resolved Hide resolved

pwhelan self-assigned this Jun 28, 2023

pwhelan temporarily deployed to pr June 28, 2023 21:00 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr June 28, 2023 21:21 — with GitHub Actions Inactive

pwhelan force-pushed the pwhelan-in_mem_metrics branch from ca69541 to 5145b57 Compare June 29, 2023 15:17

pwhelan temporarily deployed to pr June 29, 2023 15:18 — with GitHub Actions Inactive

pwhelan requested a review from patrick-stephens June 29, 2023 15:29

pwhelan temporarily deployed to pr June 29, 2023 15:43 — with GitHub Actions Inactive

pwhelan force-pushed the pwhelan-in_mem_metrics branch from 5145b57 to e5f647f Compare July 26, 2023 14:14

pwhelan temporarily deployed to pr July 26, 2023 14:15 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr July 26, 2023 14:43 — with GitHub Actions Inactive

pwhelan added a commit to fluent/fluent-bit-docs that referenced this pull request Jul 26, 2023

in_mem_metrics: initial documentation.

4b7e3dd

This adds documentation for the new `in_mem_metrics` plugin: fluent/fluent-bit#7615. Signed-off-by: Phillip Whelan <[email protected]>

pwhelan mentioned this pull request Jul 26, 2023

in_mem_metrics: initial documentation. fluent/fluent-bit-docs#1170

Open

pwhelan added 2 commits October 22, 2025 20:14

in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for…

6c08c6b

… linux. Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS, PSS and shared memory usage of all or some processes on linux. Signed-off-by: Phillip Whelan <[email protected]>

pwhelan force-pushed the pwhelan-in_mem_metrics branch from e5f647f to 27d1306 Compare October 22, 2025 23:15

pwhelan temporarily deployed to pr October 22, 2025 23:15 — with GitHub Actions Inactive

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

pwhelan added 3 commits October 22, 2025 20:29

in_mem_metrics: add missing headers to avoid implicit declarations.

d76884a

Signed-off-by: Phillip Whelan <[email protected]>

in_mem_metrics: NULL terminate exe_path with return value from readlink.

e693f65

Signed-off-by: Phillip Whelan <[email protected]>

in_mem_metrics: document is_chosen_cmd.

63e0b04

Signed-off-by: Phillip Whelan <[email protected]>

pwhelan temporarily deployed to pr October 22, 2025 23:36 — with GitHub Actions Inactive

pwhelan added 3 commits October 22, 2025 20:38

in_mem_metrics: convert raw units (kB) to bytes.

38a9a16

Signed-off-by: Phillip Whelan <[email protected]>

in_mem_metrics: free ctx in case of failure in flb_input_config_map_set.

2122f8d

Signed-off-by: Phillip Whelan <[email protected]>

in_mem_metrics: return error and free ctx in case of failure in cmt_c…

19ecddc

…reate. Signed-off-by: Phillip Whelan <[email protected]>

pwhelan temporarily deployed to pr October 22, 2025 23:42 — with GitHub Actions Inactive

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

pwhelan temporarily deployed to pr October 23, 2025 00:04 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr October 23, 2025 00:05 — with GitHub Actions Inactive

in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for linux. #7615

Are you sure you want to change the base?

in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for linux. #7615

Uh oh!

Conversation

pwhelan commented Jun 27, 2023 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

Uh oh!

samos667 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eschabell commented Oct 22, 2025

Uh oh!

pwhelan commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

pwhelan commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pwhelan commented Jun 27, 2023 •

edited by coderabbitai bot

Loading

samos667 commented Aug 18, 2025 •

edited

Loading

pwhelan commented Oct 22, 2025 •

edited

Loading

coderabbitai bot commented Oct 22, 2025 •

edited

Loading