out_loki: Handle loki tenant_id_key properly#11832
Conversation
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR replaces dynamic per-thread tenant tracking with per-flush tenant grouping: it extracts tenant IDs from records, groups records by tenant during flush, composes tenant-filtered payloads, and sends one HTTP POST per tenant group with configurable error-handling. ChangesLoki Tenant Grouping Refactor
Sequence DiagramsequenceDiagram
participant FlushCB as cb_loki_flush
participant GroupHelper as collect_tenant_groups
participant PayloadCompose as loki_compose_payload
participant SendPayload as send_loki_payload
participant HTTPClient as HTTP Client
FlushCB->>GroupHelper: Check if tenant_id_key configured
alt No tenant_id_key
GroupHelper-->>FlushCB: Single payload (no grouping)
FlushCB->>PayloadCompose: Compose with tenant_filter=NULL
PayloadCompose-->>FlushCB: Payload buffer
FlushCB->>SendPayload: Send with default tenant_id
SendPayload->>HTTPClient: POST with X-Scope-OrgID (default)
else tenant_id_key configured
GroupHelper->>GroupHelper: Iterate records, extract effective tenant IDs
GroupHelper-->>FlushCB: Array of tenant groups with record counts
loop For each tenant group
FlushCB->>PayloadCompose: Compose with tenant_filter=group_tenant_id
PayloadCompose-->>FlushCB: Tenant-filtered payload
FlushCB->>SendPayload: Send with group_tenant_id
SendPayload->>HTTPClient: POST with X-Scope-OrgID (group tenant)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 47a2f77672
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/runtime/out_loki.c (1)
76-101: 💤 Low valueThe callback ordering dependency is implicit.
The headers callback captures data at
slot = tenant_request_count, then the payload callback captures at the same slot and increments the count. This assumes headers callback is always invoked before payload callback for each request. While this matches Fluent Bit's current HTTP client behavior, consider adding a brief comment documenting this assumption.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runtime/out_loki.c` around lines 76 - 101, The callbacks cb_loki_debug_headers and cb_loki_debug_payload rely on an implicit ordering (headers callback runs before payload callback) because cb_loki_debug_headers records tenant_headers at slot = tenant_request_count while cb_loki_debug_payload writes tenant_payloads at the same slot and then increments tenant_request_count; add a concise comment above these functions (or at their shared mutex/slot logic) explicitly documenting this assumption and referencing the dependency between tenant_request_count, cb_loki_debug_headers, and cb_loki_debug_payload so future readers know the ordering is relied upon and why.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/runtime/out_loki.c`:
- Around line 76-101: The callbacks cb_loki_debug_headers and
cb_loki_debug_payload rely on an implicit ordering (headers callback runs before
payload callback) because cb_loki_debug_headers records tenant_headers at slot =
tenant_request_count while cb_loki_debug_payload writes tenant_payloads at the
same slot and then increments tenant_request_count; add a concise comment above
these functions (or at their shared mutex/slot logic) explicitly documenting
this assumption and referencing the dependency between tenant_request_count,
cb_loki_debug_headers, and cb_loki_debug_payload so future readers know the
ordering is relied upon and why.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2ea6b796-3361-4827-9545-b2c16a9980cb
📒 Files selected for processing (3)
plugins/out_loki/loki.cplugins/out_loki/loki.htests/runtime/out_loki.c
💤 Files with no reviewable changes (1)
- plugins/out_loki/loki.h
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/runtime/out_loki.c (1)
278-283: 💤 Low valueMissing
flb_http_server_stopcall on pthread_create failure.If
pthread_createfails after the HTTP server was successfully started on line 268, the server is destroyed without being stopped first. This could leak listening resources.Suggested fix
ret = pthread_create(&mock->thread, NULL, tenant_policy_server_loop, mock); if (ret != 0) { + flb_http_server_stop(&mock->server); flb_http_server_destroy(&mock->server); mk_event_loop_destroy(mock->event_loop); mock->event_loop = NULL; return -1; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runtime/out_loki.c` around lines 278 - 283, On pthread_create failure path where ret != 0, stop the HTTP server before destroying it to avoid leaking listening resources: call flb_http_server_stop(&mock->server) prior to flb_http_server_destroy(&mock->server) (same block that clears mock->event_loop and returns -1), ensuring the server is cleanly stopped when pthread_create fails after a successful start.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/runtime/out_loki.c`:
- Around line 278-283: On pthread_create failure path where ret != 0, stop the
HTTP server before destroying it to avoid leaking listening resources: call
flb_http_server_stop(&mock->server) prior to
flb_http_server_destroy(&mock->server) (same block that clears mock->event_loop
and returns -1), ensuring the server is cleanly stopped when pthread_create
fails after a successful start.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fdc3dad1-a832-49c3-a11e-c1fa89c7d1e6
📒 Files selected for processing (3)
plugins/out_loki/loki.cplugins/out_loki/loki.htests/runtime/out_loki.c
607c3ae to
7a14b67
Compare
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
7a14b67 to
4146c30
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/runtime/out_loki.c`:
- Around line 47-53: start_tenant_policy_server can fail before the thread is
created but stop_tenant_policy_server unconditionally joins/destroys fields and
also writes to mock->stop without synchronization; modify teardown to check
whether the thread was successfully started before joining or destroying (guard
joins/destroys on a flag such as server->thread_started or thread != 0),
initialize and atomically access/mock->stop (use atomic or mutex) or set stop
only if thread exists, and ensure start_tenant_policy_server sets the guard on
successful thread creation so stop_tenant_policy_server safely handles partially
initialized servers (update references to struct tenant_policy_server,
start_tenant_policy_server, stop_tenant_policy_server, mock->stop, and thread).
- Around line 1069-1072: The test is nondeterministic because
flb_service_set(...) doesn't restrict the scheduler window, so the
partial_success branch may schedule a retry after the 2.5s assertion; update the
helper that creates ctx (where flb_service_set is called) to explicitly set the
scheduler/retry window to a value that covers the full retry backoff (e.g., >=
expected retry interval, such as 3000 ms) by passing the appropriate
scheduler/retry window option into flb_service_set for ctx so the assertion
reliably covers a full retry interval.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/runtime/out_loki.c (1)
1042-1045:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake the mixed-tenant input deterministic.
These tests send
tenant-aandtenant-bin two separateflb_lib_push()calls, but the regression here only exists when different tenants share the same chunk. If those pushes end up in separate chunks, the old implementation would still pass and these tests won't prove the fix. Please force both records into one chunk/batch, or add an assertion that this path is exercising a mixed-tenant chunk.Also applies to: 1115-1118
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runtime/out_loki.c` around lines 1042 - 1045, The test is non-deterministic because two separate flb_lib_push() calls (using ctx, in_ffd, tenant_a and tenant_b) may create two chunks; change the test to ensure both tenant_a and tenant_b records are pushed into the same chunk by sending them in a single flb_lib_push() payload (e.g. concatenate the two record lines into one buffer separated by the expected record delimiter or wrap them in a single JSON array) or otherwise force batch/chunking (e.g. adjust the input to emit both records in one call) and keep the same TEST_CHECK(ret >= 0) assertion; update both occurrences (the block around the shown calls and the similar block at lines 1115-1118) and reference ctx and in_ffd when making the single push so the mixed-tenant path is deterministically exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/runtime/out_loki.c`:
- Around line 1042-1045: The test is non-deterministic because two separate
flb_lib_push() calls (using ctx, in_ffd, tenant_a and tenant_b) may create two
chunks; change the test to ensure both tenant_a and tenant_b records are pushed
into the same chunk by sending them in a single flb_lib_push() payload (e.g.
concatenate the two record lines into one buffer separated by the expected
record delimiter or wrap them in a single JSON array) or otherwise force
batch/chunking (e.g. adjust the input to emit both records in one call) and keep
the same TEST_CHECK(ret >= 0) assertion; update both occurrences (the block
around the shown calls and the similar block at lines 1115-1118) and reference
ctx and in_ffd when making the single push so the mixed-tenant path is
deterministically exercised.
In the previous implementation, we just peek the first attempted value of tenant_id_key in the same chunk contents.
Instead, we have to peek inside of the chunks. This is because some of the chunks include the different tenant_id in the same chunk.
Closes #11824.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
fluent/fluent-bit-docs#2581
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Improvements
New Features
Tests