Improve benchmark docs page coverage and formatting by kuldeep27396 · Pull Request #15623 · apache/iceberg

kuldeep27396 · 2026-03-13T17:52:36Z

Why this change

The benchmarks page had a few separate problems that were all tied together:

the markdown had rendering issues from inconsistent list and code-block formatting
the page still described the Spark benchmarks as if they were generic "spark-2 or spark-3" commands, while the current build uses versioned Gradle modules
the page only documented a small subset of the JMH benchmarks that exist in the repository today

Because of that, a narrow formatting-only edit would still have left the page inaccurate and incomplete. The page needed to be cleaned up structurally and rewritten around the current benchmark layout so the commands and benchmark inventory stayed aligned with the repository.

What changed

fixed the markdown formatting so the page renders consistently
replaced the repeated per-benchmark command blocks with grouped command templates for core, data, Spark, Spark extensions, and Flink benchmarks
documented the current default benchmark modules and version properties used by the build
removed stale wording that implied the documented commands applied to older Spark generations without module/version differences
added the missing benchmark groups and benchmark names from the current default JMH modules in the repository

How this was validated

compared the page content against the current JMH benchmark sources under core, data, spark/v4.1/spark, spark/v4.1/spark-extensions, and flink/v2.1/flink
verified the documented Gradle task paths with:

./gradlew :iceberg-core:help --task jmh

That command resolved successfully and confirmed the task paths referenced by the updated page, including:

:iceberg-core:jmh
:iceberg-data:jmh
:iceberg-flink:iceberg-flink-2.1:jmh
:iceberg-spark:iceberg-spark-4.1_2.13:jmh
:iceberg-spark:iceberg-spark-extensions-4.1_2.13:jmh

Closes #15556

kuldeep27396 · 2026-03-13T17:56:43Z

AI-generated summary of this PR:

fixed the markdown formatting so the benchmarks page renders consistently
updated the benchmark commands to match the current versioned Gradle modules
expanded the page to cover the benchmark groups and benchmark names present in the repository

I reviewed and posted this summary, but the text above was generated with AI.

steveloughran

This is a really good rework.

Note that I do think the lists of benchmarks for each module would be best as tables, though that's very subjective.

steveloughran · 2026-03-13T21:46:37Z

site/docs/benchmarks.md


-Below are the existing benchmarks shown with the actual commands on how to run them locally.
+JMH writes human-readable output to `build/reports/jmh/human-readable-output.txt` and JSON output to `build/reports/jmh/results.json` by default. Override them with `-PjmhOutputPath=<path>` and `-PjmhJsonOutputPath=<path>` if needed.
+


probably worth mentioning https://jmh.morethan.io/ as a way to display json results: you can share the results.json with others for them to view.

steveloughran · 2026-03-13T21:47:38Z

site/docs/benchmarks.md

-Below are the existing benchmarks shown with the actual commands on how to run them locally.
+JMH writes human-readable output to `build/reports/jmh/human-readable-output.txt` and JSON output to `build/reports/jmh/results.json` by default. Override them with `-PjmhOutputPath=<path>` and `-PjmhJsonOutputPath=<path>` if needed.
+
+The default versions in this repository are:


listing this creates one more maintenance point when versions are upgraded...or somewhere where they get out of date. I think it's best to not enumerate.

manuzhang · 2026-03-15T14:35:01Z

@kuldeep27396 Please explicitly mark the message generated by AI.

kuldeep27396 · 2026-03-15T17:41:05Z

Updated in efb5a2d: converted the benchmark inventories on the docs page to tables for readability, and I also edited the earlier summary comment to explicitly mark it as AI-generated.

steveloughran

commented on (existing) text. TLDR; your macbook with an ide and claude using 3/4 RAM doesn't resemble a 32 core x86 server with 128 GB of RAM, so try to set up realistic test considitions, or at least close the IDE before running the tests overnight

steveloughran · 2026-03-16T17:05:47Z

site/docs/benchmarks.md

-Benchmarks are located under `<project-name>/jmh`. It is generally favorable to only run the tests of interest rather than running all available benchmarks.
-Also note that JMH benchmarks run within the same JVM as the system-under-test, so results might vary between runs.
+Benchmarks are located under `<module>/src/jmh`. It is generally better to run only the benchmarks you are investigating instead of the full suite.
+Also note that JMH benchmarks run in the same JVM as the system under test, so results may vary between runs.


If look for uses of @Fork(1) in the benchmarks you'll see this isn't true. The only @fork(0) right now is in the PR I'm working on, as I want fast iterations. (#15629).

But even with the forking, performance varies a lot on a local system because of what else you are doing on the same system, memory consumption etc. If you kick off a benchmark then have a conversation with an AI agent you will get bad numbers.

Better to say

Iceberg benchmarks execute in a new JVM for each test, for isolation and reproducibility.
Other work taking place on the same computer may create demand for CPU, memory or other system resource, and so produce inconsistent results.

Benchmarks results should be more reliable if executed on a system without other CPU, RAM or IO processes active at the same time. Even better: run them on a host dedicated exclusively to the benchmark.

In-cloud testing with servers similar to production systems is ideal here, though the "noisy neighbour" problem is still a problem.

Improve benchmark docs page

78ec4a6

github-actions bot added the docs label Mar 13, 2026

steveloughran reviewed Mar 13, 2026

View reviewed changes

Docs: Use benchmark tables on docs page

efb5a2d

Docs: Mention JMH visualizer for results

d470077

steveloughran reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve benchmark docs page coverage and formatting#15623

Improve benchmark docs page coverage and formatting#15623
kuldeep27396 wants to merge 3 commits intoapache:mainfrom
kuldeep27396:fix/benchmarks-docs-page

kuldeep27396 commented Mar 13, 2026 •

edited

Loading

Uh oh!

kuldeep27396 commented Mar 13, 2026 •

edited

Loading

Uh oh!

steveloughran left a comment

Uh oh!

steveloughran Mar 13, 2026

Uh oh!

steveloughran Mar 13, 2026

Uh oh!

manuzhang commented Mar 15, 2026

Uh oh!

kuldeep27396 commented Mar 15, 2026

Uh oh!

steveloughran left a comment

Uh oh!

steveloughran Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Below are the existing benchmarks shown with the actual commands on how to run them locally.
		JMH writes human-readable output to `build/reports/jmh/human-readable-output.txt` and JSON output to `build/reports/jmh/results.json` by default. Override them with `-PjmhOutputPath=<path>` and `-PjmhJsonOutputPath=<path>` if needed.

Conversation

kuldeep27396 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this change

What changed

How this was validated

Uh oh!

kuldeep27396 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

steveloughran Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

manuzhang commented Mar 15, 2026

Uh oh!

kuldeep27396 commented Mar 15, 2026

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran Mar 16, 2026

Choose a reason for hiding this comment

In-cloud testing with servers similar to production systems is ideal here, though the "noisy neighbour" problem is still a problem.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kuldeep27396 commented Mar 13, 2026 •

edited

Loading

kuldeep27396 commented Mar 13, 2026 •

edited

Loading