Skip to content

iceberg benchmarking#4226

Open
ness-david-dedu wants to merge 3 commits intoredpanda-data:mainfrom
ness-david-dedu:feature/iceberg-benchmarking
Open

iceberg benchmarking#4226
ness-david-dedu wants to merge 3 commits intoredpanda-data:mainfrom
ness-david-dedu:feature/iceberg-benchmarking

Conversation

@ness-david-dedu
Copy link
Copy Markdown
Contributor

Iceberg Output — Benchmarking Suite

Adds a self-contained benchmarking suite for the iceberg output component, covering write throughput across CPU core counts, batch sizes, and
max_in_flight concurrency.

What's included

  • internal/impl/iceberg/bench/ — Docker Compose setup (MinIO + Iceberg REST catalog), benchmark_config.yaml with a synthetic event generator,
    and a Taskfile with parameterized bench tasks
  • docs/benchmark-results/iceberg.md — Full results matrix with observations

Key findings

Batch size is the dominant factor at low concurrency. At 1 core, throughput scales from 757 msg/sec (batch=1000) to 5,442 msg/sec
(batch=10000) — a 7x gain. Each batch is one catalog commit round-trip, so fewer commits directly translates to higher throughput.

max_in_flight is the most impactful knob. At GOMAXPROCS=4 and batch=10000, increasing max_in_flight from 4 → 32 yields a 4x throughput gain
(8,483 → 34,835 msg/sec), scaling linearly until MinIO saturates at ~34K msg/sec / 5 MB/sec.

The connector is not the bottleneck. The throughput ceiling is MinIO (local Docker), not the Iceberg writer. With enough concurrent commits
(max_in_flight=32, batch=10000), the connector fully saturates the storage layer.

Sweet spot: batch=10000, max_in_flight=32 — reaches maximum throughput with the least concurrency overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant