-
Notifications
You must be signed in to change notification settings - Fork 1.8k
in_mem_metrics: add metrics plugin that tracks RSS and PSS memory for linux. #7615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ca69541 to
5145b57
Compare
5145b57 to
e5f647f
Compare
This adds documentation for the new `in_mem_metrics` plugin: fluent/fluent-bit#7615. Signed-off-by: Phillip Whelan <[email protected]>
|
process-exporter have been (mostly) rewrite and implemented since but sadly lack of smaps reading for accurate memory usage report. process-exporter(go) support it. Does it's planned to be supported inside Fluent Bit? Edit:
Not every system is build around kubernetes, podman or docker. And as far I known, node-exporter is not capable of exporting (by) process statistics. There is nothing more universal than /proc stat, why would you not use it (I mean, at least support it) over anything else, is probably the main response I can give. |
|
@pwhelan would you mind fixing the conflicts here? |
Yeah, I'll take a look today and make sure it's all working. *** edit *** Done. |
… linux. Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS, PSS and shared memory usage of all or some processes on linux. Signed-off-by: Phillip Whelan <[email protected]>
Allow using comma delimited values for the PID, exec(utable) and cmdline filters to allow for multiple values. All these filters are non-exclusive so a process that matches any one of them is chosen to be polled for metrics. Signed-off-by: Phillip Whelan <[email protected]>
e5f647f to
27d1306
Compare
WalkthroughAdds a new Linux input plugin "mem_metrics" that reads process memory stats from procfs (smaps_rollup), exposes many cmt gauges, and introduces CMake build flags to enable/disable the plugin across platforms. Changes
Sequence Diagram(s)sequenceDiagram
participant Core as Fluent Bit Core
participant Plugin as mem_metrics
participant Proc as /proc
Core->>Plugin: cb_mem_metrics_init()
activate Plugin
Plugin->>Plugin: create context & CMT gauges
Plugin->>Plugin: register collector (timer)
deactivate Plugin
loop every interval (default 5s)
Core->>Plugin: cb_collector_time()
activate Plugin
alt filter targets configured
Plugin->>Proc: read /proc/self/smaps_rollup (or specific pid)
else full scan
Plugin->>Proc: glob /proc/[0-9]*
loop per pid
Plugin->>Proc: read /proc/[pid]/smaps_rollup
Plugin->>Plugin: parse values -> bytes
Plugin->>Plugin: update CMT gauges (tag: pid/type)
end
end
Plugin->>Core: append collected metrics
deactivate Plugin
end
Core->>Plugin: cb_mem_metrics_exit()
activate Plugin
Plugin->>Plugin: cleanup CMT & context
deactivate Plugin
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (3)
plugins/in_mem_metrics/mem_metrics.c (3)
261-263: Avoid truncating smaps_rollup parsing with a hard split limit.Using max_split=21 risks truncation if kernels add lines. Prefer unlimited or a safe upper bound.
- lines = flb_utils_split(buf, '\n', 21); + lines = flb_utils_split(buf, '\n', 0); /* 0 => no limit (unbounded) */If 0 isn’t supported by split(), use a generous bound (e.g., 64).
370-383: Stat/access the procfs root, not the glob pattern, on GLOB_NOMATCH.stat("/proc/[0-9]*") is invalid. Check ctx->procfs_path for existence/permissions to produce clearer diagnostics.
- ret = stat(real_path, &st); + ret = stat(ctx->procfs_path, &st); if (ret == -1) { - flb_plg_debug(ctx->ins, "cannot read info from: %s", real_path); + flb_plg_debug(ctx->ins, "cannot read info from: %s", ctx->procfs_path); } else { - ret = access(real_path, R_OK); + ret = access(ctx->procfs_path, R_OK); if (ret == -1 && errno == EACCES) { - flb_plg_error(ctx->ins, "NO read access for path: %s", real_path); + flb_plg_error(ctx->ins, "no read access for path: %s", ctx->procfs_path); } else { - flb_plg_debug(ctx->ins, "NO matches for path: %s", real_path); + flb_plg_debug(ctx->ins, "no matches for pattern under: %s", ctx->procfs_path); } }
511-524: Update config descriptions: values support comma-separated lists.Descriptions still say “single”; code and CLIST_1 support lists (incl. “self”/“0” for PID). Align help text.
- "Filter for a single executable" + "Filter by executable path (comma-separated list supported)" @@ - "Filter by the command line" + "Filter by full command line (space-joined argv; comma-separated list supported)" @@ - "Filter by PID" + "Filter by PID (supports comma-separated list and special values: self, 0)"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CMakeLists.txt(1 hunks)cmake/plugins_options.cmake(1 hunks)cmake/windows-setup.cmake(1 hunks)plugins/CMakeLists.txt(1 hunks)plugins/in_mem_metrics/CMakeLists.txt(1 hunks)plugins/in_mem_metrics/mem_metrics.c(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
PR: fluent/fluent-bit#9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
CMakeLists.txtcmake/windows-setup.cmake
🧬 Code graph analysis (1)
plugins/in_mem_metrics/mem_metrics.c (7)
src/flb_utils.c (2)
flb_utils_split(464-467)flb_utils_split_free(477-489)lib/cmetrics/src/cmt_gauge.c (1)
cmt_gauge_set(94-109)plugins/in_blob/win32_glob.c (2)
glob(119-156)globfree(107-117)src/flb_input_metric.c (1)
flb_input_metrics_append(102-110)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)include/fluent-bit/flb_input.h (1)
flb_input_config_map_set(715-741)src/flb_input.c (2)
flb_input_set_context(1610-1613)flb_input_set_collector_time(1685-1704)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (6)
cmake/plugins_options.cmake (1)
39-39: LGTM: option defined and aligned with Linux-only registration.
No action needed.cmake/windows-setup.cmake (1)
57-57: LGTM: disabled by default on Windows.CMakeLists.txt (1)
276-283: LGTM: enabled under FLB_ALL.
No further tweaks needed.plugins/in_mem_metrics/CMakeLists.txt (1)
1-4: LGTM: plugin target definition is correct.plugins/CMakeLists.txt (1)
251-267: LGTM: Linux-only registration added in the right block.plugins/in_mem_metrics/mem_metrics.c (1)
408-419: Review comment is accurate. The unbounded metric series growth issue is real and the proposed solution is sound.I've verified the core claims:
Persistent reuse of ctx→cmt: Confirmed—created once at initialization (line 458) and never recreated across collection ticks.
Metric series accumulate permanently: Confirmed—each scrape calls
cmt_gauge_set()with dynamic pid labels, and every unique pid becomes a permanent series in the cmt object.cmetrics lacks metric deletion: Confirmed—no
cmt_gauge_remove,cmt_*_reset, or prune functions exposed. Onlycmt_counter_allow_reset()exists for counters; gauges have no equivalent deletion mechanism.Proposed solution is safe: The pattern of recreating cmt per scrape is valid:
- Gauge pointers in ctx are reassigned to new gauge objects from the new cmt
- Macros can be called multiple times to re-initialize gauges
- Old cmt is properly destroyed
- This pattern has precedent in the codebase (processor_labels and processor_metrics_selector both create fresh cmt objects per operation)
The review comment correctly identifies a real cardinality leak and proposes a practical workaround. The suggested diff pattern aligns with how other components handle cmetrics lifecycle management in this codebase.
| if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) { | ||
| cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" }); | ||
| } | ||
| if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) { | ||
| cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSS “base” type label is misnamed “clean”.
For the aggregate “Pss:” line, label should reflect “base” (or “total”), not “clean”. Current label collides semantically with “Pss_Dirty”.
- cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" });
+ cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });Also update the comment at Line 51 to match (“types=base,dirty,anon,file,shmem”).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) { | |
| cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" }); | |
| } | |
| if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) { | |
| cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" }); | |
| } | |
| if (strncasecmp(cur->value, "Pss:", sep - cur->value) == 0) { | |
| cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" }); | |
| } | |
| if (strncasecmp(cur->value, "Pss_Dirty:", sep - cur->value) == 0) { | |
| cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "dirty" }); | |
| } |
🤖 Prompt for AI Agents
In plugins/in_mem_metrics/mem_metrics.c around lines 278 to 283, the PSS
aggregate label for the "Pss:" line is incorrectly using "clean" which collides
with "Pss_Dirty:"; change the label from "clean" to "base" (or "total" as
preferred) so the two types read e.g. "base" and "dirty", and also update the
comment at line 51 to reflect the correct label set
("types=base,dirty,anon,file,shmem"). Ensure the cmt_gauge_set call for "Pss:"
uses the new label string and the comment exactly matches the updated types
list.
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
…reate. Signed-off-by: Phillip Whelan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
plugins/in_mem_metrics/mem_metrics.c (3)
92-101: readlink handling LGTM.Properly checks return and NUL-terminates.
293-307: PSS base label should not be “clean”.“Pss:” is the aggregate; label it “base” (or “total”) to avoid collision with “dirty” and to match the comment.
- cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "clean" }); + cmt_gauge_set(ctx->pss, ts, val, 2, (char *[]){ pid, "base" });
126-146: Fix rc check and clarify cmdline filter semantics.Two issues in
is_chosen_cmd():
rc check bug (line 135):
rc <= -1is dead code;read()never returns ≤ -1. Userc <= 0to catch errors and empty reads.cmdline semantics mismatch: Line 112 comment says "actual command, not any of the arguments" (argv[0] only), but line 538 config help says "Filter by the command line" (implying full cmdline). Code matches only argv[0] because
/proc/<pid>/cmdlineuses NUL-separators andstrcmp()stops at the first NUL. Either document as argv[0]-only or normalize NULs to spaces for full-line matching. Choose one and keep docs consistent.- if (rc <= -1) { + if (rc <= 0) { return FLB_FALSE; }
🧹 Nitpick comments (2)
plugins/in_mem_metrics/mem_metrics.c (2)
423-433: Use a single timestamp per scrape.Ensures internal metric points share one ts.
static int cb_collector_time(struct flb_input_instance *ins, struct flb_config *config, void *in_context) { struct mem_metrics *ctx = (struct mem_metrics *)in_context; - if (is_chosen_pid_self(ctx)) { - mmtx_parse_proc(ctx, cfl_time_now(), "/proc/self"); - } else { - mmtx_utils_path_scan_procs(ctx, cfl_time_now()); - } + uint64_t ts = cfl_time_now(); + if (is_chosen_pid_self(ctx)) { + mmtx_parse_proc(ctx, ts, "/proc/self"); + } else { + mmtx_utils_path_scan_procs(ctx, ts); + } flb_input_metrics_append(ins, NULL, 0, ctx->cmt); FLB_INPUT_RETURN(0); }
532-544: Config help strings imply single value; code supports lists and special values.Please update help to reflect comma‑delimited lists and “self,0”.
- "Filter for a single executable" + "Comma-delimited allow list of executable paths (matches /proc/<pid>/exe)" @@ - "Filter by the command line" + "Comma-delimited allow list; match command (argv[0]) or full cmdline (see docs)" @@ - "Filter by PID" + "Comma-delimited allow list of PIDs; special: self, 0"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_mem_metrics/mem_metrics.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_mem_metrics/mem_metrics.c (7)
src/flb_utils.c (2)
flb_utils_split(464-467)flb_utils_split_free(477-489)lib/cmetrics/src/cmt_gauge.c (1)
cmt_gauge_set(94-109)plugins/in_blob/win32_glob.c (2)
glob(119-156)globfree(107-117)src/flb_input_metric.c (1)
flb_input_metrics_append(102-110)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)include/fluent-bit/flb_input.h (1)
flb_input_config_map_set(715-741)src/flb_input.c (2)
flb_input_set_context(1610-1613)flb_input_set_collector_time(1685-1704)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
| struct flb_input_plugin in_mem_metrics_plugin = { | ||
| .name = "mem_metrics", | ||
| .description = "Full Memory Metrics for Linux", | ||
| .cb_init = cb_mem_metrics_init, | ||
| .cb_pre_run = NULL, | ||
| .cb_collect = NULL, | ||
| .cb_flush_buf = NULL, | ||
| .cb_pause = NULL, | ||
| .cb_resume = NULL, | ||
| .cb_exit = cb_mem_metrics_exit, | ||
| .config_map = config_map, | ||
| .flags = FLB_INPUT_CORO | FLB_INPUT_THREADED | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Operational: unbounded PID label cardinality can grow memory over time.
Each new pid creates a new time series and persists beyond process exit. Long‑running agents with churn (e.g., k8s nodes) will grow cmetrics state.
Mitigations:
- Prune metrics for pids not seen this scrape (if cmetrics supports deletion).
- Or keep a short TTL for unseen pids, then delete.
- Or snapshot per collect: build a temporary cmt each cycle, append, then destroy (costlier but bounded).
Please check whether cmetrics exposes deletion/clear APIs:
🏁 Script executed:
#!/bin/bash
# Search for deletion APIs in cmetrics
rg -nP 'cmt_.*(delete|remove|clear|unset)' lib/ src/ plugins/ -C2
rg -nP 'cmt_map_metric_get\(|cmt_metric_set\(' lib/cmetrics -C3Length of output: 16268
I need to dig deeper into the cmetrics public API to verify whether deletion/cleanup mechanisms exist beyond destroying entire contexts.
Let me search the codebase directly for the complete cmt_map header and any related APIs for metric deletion or cleanup.
---
🏁 Script executed:
```shell
#!/bin/bash
# Extract full cmt_map.h header to see all public functions
cat lib/cmetrics/include/cmetrics/cmt_map.h
Length of output: 2285
🏁 Script executed:
#!/bin/bash
# Search for all functions that operate on cmt_map or perform cleanup
rg -nP 'cmt.*\(' lib/cmetrics/include/cmetrics/cmt_map.h -A2Length of output: 719
🏁 Script executed:
#!/bin/bash
# Check if mem_metrics plugin actually uses cmetrics
rg -nP 'cmt_|cmetrics' plugins/in_mem_metrics/ -C2Length of output: 10195
Let me verify whether the mem_metrics plugin implements any cleanup logic in its collect function:
🏁 Script executed:
#!/bin/bash
# Find the collect and pre_run callback implementations
rg -nP 'cb_mem_metrics_collect|cb_mem_metrics_pre_run' plugins/in_mem_metrics/ -A30Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there's any tracking of previous PIDs or cleanup logic
rg -nP 'prev|last|old|stale|clear|reset' plugins/in_mem_metrics/mem_metrics.c -B2 -A2Length of output: 87
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for the main function that scans /proc
rg -nP 'status|proc|collect_mem|/proc' plugins/in_mem_metrics/mem_metrics.c -B3 -A3
Length of output: 3962
Address unbounded PID label cardinality by pruning stale metrics.
The concern is valid: each new pid creates a new time series and persists beyond process exit. The plugin calls cmt_gauge_set() with pid as a label for each collection cycle but never removes metrics for processes that have exited. In environments with process churn (e.g., Kubernetes nodes with frequent pod restarts), this causes unbounded cardinality growth.
cmetrics exposes cmt_map_metric_destroy() in its public API (lib/cmetrics/include/cmetrics/cmt_map.h), so deletion is feasible. The recommended approach:
Track PIDs seen during the current scan and call cmt_map_metric_destroy() for any metric whose PID is no longer present in /proc. This requires either maintaining a previous PID set or iterating cmetrics' internal metric list to identify stale entries.
Alternatively, destroy and recreate the cmt context per collection cycle (cmt_destroy() followed by cmt_create()), which bounds memory at the cost of higher CPU per scrape.
|
@eschabell I have been working on a new function for |
Summary
Add a new metrics plugin that uses /proc/[0-9]+/smaps_rollup to track the RSS, PSS and shared memory usage of all or some processes on linux.
This plugin is being used to track memory usage and memory leaks.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit