Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Iceberg Output — Benchmarking Suite
Adds a self-contained benchmarking suite for the iceberg output component, covering write throughput across CPU core counts, batch sizes, and
max_in_flight concurrency.
What's included
and a Taskfile with parameterized bench tasks
Key findings
Batch size is the dominant factor at low concurrency. At 1 core, throughput scales from 757 msg/sec (batch=1000) to 5,442 msg/sec
(batch=10000) — a 7x gain. Each batch is one catalog commit round-trip, so fewer commits directly translates to higher throughput.
max_in_flight is the most impactful knob. At GOMAXPROCS=4 and batch=10000, increasing max_in_flight from 4 → 32 yields a 4x throughput gain
(8,483 → 34,835 msg/sec), scaling linearly until MinIO saturates at ~34K msg/sec / 5 MB/sec.
The connector is not the bottleneck. The throughput ceiling is MinIO (local Docker), not the Iceberg writer. With enough concurrent commits
(max_in_flight=32, batch=10000), the connector fully saturates the storage layer.
Sweet spot: batch=10000, max_in_flight=32 — reaches maximum throughput with the least concurrency overhead.