Skip to content

Harden data races between threaded input workers and the main engine#12007

Open
erain wants to merge 4 commits into
fluent:masterfrom
erain:upstream-pr/threaded-input-data-races
Open

Harden data races between threaded input workers and the main engine#12007
erain wants to merge 4 commits into
fluent:masterfrom
erain:upstream-pr/threaded-input-data-races

Conversation

@erain

@erain erain commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Three data races between threaded input worker threads and the main
engine thread
, all surfaced by ThreadSanitizer while running a threaded
tail input (with multiline) alongside the metrics exporter. They are
undefined under the C memory model and benign on the supported hardware, but
worth closing:

  • metricsflb_metrics_sum() updates a counter from the owning
    input/output worker thread while the metrics exporter reads the same counter
    from the main thread. This one is live whenever metrics are scraped together
    with a threaded input.
  • inputflb_input_collector_fd() runs on the main thread but iterated a
    threaded input's collectors, racing the worker thread that initializes those
    collector fds at startup. A threaded input's collectors are registered and
    dispatched in the input's own thread/event loop and are never matched here,
    so the handler now skips threaded inputs (also a small optimization).
  • engine / binconfig->grace_input is published by the engine thread
    during startup and read by the supervisor on the main thread.

A small include/fluent-bit/flb_atomic.h helper is added (relaxed GCC/Clang
__atomic builtins, plain-access fallback) and used for the metric counter and
grace_input.

Note: these are not the ARM64 SIGSEGV in
flb_input_chunk_ring_buffer_collector. That one is a separate unsynchronized
SPSC hand-off that is already serialized by the flb_ring_buffer mutex on
master. These changes were found with the same ThreadSanitizer setup while
investigating threaded-mode data races (cf. #9835).


Testing

Built with ThreadSanitizer (-fsanitize=thread, jemalloc disabled) and run for
~45s against a threaded tail + multiline.parser cri + null pipeline driven
by a CRI log generator (~6M lines, 6 files):

  • Before: TSan reports exactly these 3 races (flb_metrics_sum,
    flb_input_collector_fd, grace_input).
  • After: 0 ThreadSanitizer reports, clean shutdown, no crash.

Example configuration used:

[SERVICE]
    flush     0.2
    log_level info

[INPUT]
    name              tail
    path              /tmp/repro/logs/*.log
    read_from_head    true
    threaded          on
    multiline.parser  cri
    tag               kube.*

[OUTPUT]
    name   null
    match  *
  • Example configuration file for the change (above)
  • Debug log output from testing the change (ThreadSanitizer before/after summarized above; full logs available on request)
  • [N/A] Valgrind output — ThreadSanitizer is the appropriate tool for these data races; memcheck does not detect them. TSan before/after provided instead.

Documentation

  • [N/A] Documentation required for this feature (no user-facing/behavioral change)

Backporting

  • Backport to latest stable release (safe, self-contained; at maintainers' discretion)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Improved thread-safe handling for shared runtime values using relaxed atomic operations.
    • Metric counters now update and display/export via atomic relaxed fetch/load to reduce data races.
    • Collector discovery during initialization now skips threaded collectors to avoid unstable concurrent access.
    • Supervisor grace-period state updates now use a consistent atomic snapshot across threads.
  • New Features
    • Added a shared atomic utility interface to support safe cross-thread scalar operations across supported compilers.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e4e4169-3d2a-4f98-a3ca-8fb49389bfc4

📥 Commits

Reviewing files that changed from the base of the PR and between 144901f and 6395b8c.

📒 Files selected for processing (5)
  • include/fluent-bit/flb_atomic.h
  • src/flb_engine.c
  • src/flb_input.c
  • src/flb_metrics.c
  • src/fluent-bit.c
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/flb_input.c
  • src/flb_metrics.c
  • src/flb_engine.c

📝 Walkthrough

Walkthrough

Adds relaxed atomic helpers and updates shared scalar reads and writes in engine, metrics, input, and supervisor paths.

Changes

Shared state atomics

Layer / File(s) Summary
Atomic helper header
include/fluent-bit/flb_atomic.h
Defines relaxed load, store, and fetch-add macros with compiler builtins, an MSVC backend, and a compile-time failure for unsupported compilers.
Grace input publication
src/flb_engine.c, src/fluent-bit.c
Stores config->grace_input atomically in the engine and reads the published value atomically in the supervisor update path.
Metric counter atomics
src/flb_metrics.c
Uses atomic fetch-add for accumulation and atomic loads when printing and dumping metric values.
Threaded collector skip
src/flb_input.c
Skips threaded input instances in flb_input_collector_fd() during the collector scan for the main-thread fd path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • cosmo0920

Poem

Hop hop, I found a safer way to share,
relaxed atoms glinting in the air.
Metrics count, and grace hops cleanly by,
while threaded fds drift past my bunny eye.
Thump thump, concurrency’s carrot feels fair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the PR’s main goal of hardening cross-thread data races in the engine path.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/fluent-bit/flb_atomic.h`:
- Around line 44-53: The fallback in flb_atomic.h currently maps
flb_atomic_load, flb_atomic_store, and flb_atomic_fetch_add to plain accesses,
which is unsafe for shared cross-thread state. Update the `#else` branch to use a
real atomic implementation for non-GCC/Clang compilers, or fail the build with
an explicit error if no atomic backend is available. Keep the fix scoped to the
flb_atomic_* macros so all existing call sites get proper atomic semantics.

In `@src/flb_engine.c`:
- Around line 1138-1139: The grace window update in flb_engine_started setup is
happening too late, so the main thread can observe a stale grace_input value
after FLB_ENGINE_STARTED is published. Move the flb_atomic_store for
config->grace_input to occur before calling flb_engine_started(config), keeping
the update in the startup path around the existing grace_input logic so
src/fluent-bit.c reads the correct supervisor grace window.

In `@src/fluent-bit.c`:
- Around line 1477-1479: The supervisor grace publication is only using an
atomic load in the fixed path, but the later hot-reload paths in the same
function still read ctx->config->grace_input directly and can reintroduce the
race. Update those additional grace publication sites in the function that
handles hot reloads to use flb_atomic_load on ctx->config->grace_input before
calling flb_supervisor_child_update_grace, keeping the access pattern consistent
everywhere in this flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb82bdf9-9dc4-4eb0-ab1c-9e06c68eaf41

📥 Commits

Reviewing files that changed from the base of the PR and between f317143 and 144901f.

📒 Files selected for processing (5)
  • include/fluent-bit/flb_atomic.h
  • src/flb_engine.c
  • src/flb_input.c
  • src/flb_metrics.c
  • src/fluent-bit.c

Comment thread include/fluent-bit/flb_atomic.h Outdated
Comment thread src/flb_engine.c Outdated
Comment thread src/fluent-bit.c
erain and others added 4 commits June 27, 2026 20:59
flb_metrics_sum() updates a counter from the owning input or output
worker thread while the metrics exporter reads the same counter from
the main engine thread. With a threaded input this is a data race
reported by ThreadSanitizer; it is benign on the supported hardware
but undefined under the C memory model.

Add a small flb_atomic.h helper with relaxed load/store/fetch_add and
use it for the metric value on both the summing and the reading paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Yu Yi <yiyu@google.com>
flb_input_collector_fd() runs on the main engine thread but iterated
the collectors of every input, including threaded ones. A threaded
input initializes its collector descriptors from its own worker
thread, so this races with the main thread reading them at startup.

Those collectors are registered and dispatched in the input's own
thread and event loop, so they are never matched here. Skipping
threaded inputs removes the race and avoids needless iteration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Yu Yi <yiyu@google.com>
config->grace_input is written once by the engine thread during
startup and read concurrently by the supervisor on the main thread,
which ThreadSanitizer reports as a data race.

Store it with a relaxed atomic; the matching atomic read is done at
the supervisor entry point.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Yu Yi <yiyu@google.com>
The supervisor entry point reads config->grace_input, which the engine
thread publishes during startup. Read it with a relaxed atomic so the
cross-thread access is well defined, matching the engine-side store.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Yu Yi <yiyu@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant