Add JSON output for benchmark results by dantegd · Pull Request #8027 · rapidsai/cuml

dantegd · 2026-04-28T23:17:09Z

Builds on top of #7980

This PR adds a structured JSON output path for cuML benchmarks. Benchmark runs can now write a single canonical JSON artifact containing grouped benchmark results and run metadata, including backend timings, accuracy metrics, benchmark shape, selected config/profile information, hardware metadata, package environment snapshots, and estimator parameters when available.

This PR also improves benchmark console output for long runs. Instead of repeated per-run DataFrame dumps, the runner now emits a concise aligned progress table with algorithm, shape, estimated data size, backend timings, and compact details like accuracy and speedup.

Why JSON Output

I think YAML remains the right format for benchmark configuration because it is human-authored, reviewable, and easy to maintain. JSON is the right format for benchmark output because it is machine-produced, schema-versioned, and easy for CI, dashboards, and future baseline comparison tooling to consume.

YAML in: human-authored benchmark intent
JSON out: machine-readable benchmark facts

JSON Schema

The JSON artifact has two top-level fields:

{
  "results": [],
  "metadata": {}
}

Each result represents one logical benchmark variation:

{
  "benchmark_id": "logreg_fit_narrow_default",
  "algorithm": "LogisticRegression",
  "dataset": "classification",
  "operation": "fit",
  "shape": {
    "rows": 84000000,
    "features": 16,
    "estimated_input_size_bytes": 5376000000,
    "estimated_input_size_gb": 5.376
  },
  "data": {
    "input_type": "cupy",
    "dtype": "fp32",
    "n_reps": 1
  },
  "params": {
    "declared": {},
    "effective": {
      "cpu": null,
      "gpu": {},
      "accel": null
    }
  },
  "backends": {
    "gpu": {
      "status": "success",
      "time_sec": 0.91,
      "accuracy": 0.995
    }
  }
}

Run metadata includes schema version, command, Python/platform information, cuML/git identity, runtime availability, config/profile selection, hardware details, and a compact conda or pip package snapshot.

Backend records currently use:

status: success or skipped
time_sec: benchmark time in seconds for successful runs
accuracy: accuracy or score when available
reason: explanation for skipped backends

Highlights

Adds --output results.json as the canonical benchmark artifact.
Writes a single JSON file with top-level results followed by metadata.
Groups backend results under each logical benchmark variation.
Captures backend status for successful and skipped runs.
Captures declared parameters and effective estimator parameters when available.
Captures run metadata including command, config/profile, Python/platform, cuML/git info, GPU availability, hardware details, and compact conda/pip package snapshots.
Adds hardware metadata overrides for cleaner reporting labels.
Keeps --csv as a legacy flat export.

…ecution

…chmark defaults

…ch-yaml

copy-pr-bot · 2026-04-28T23:17:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…ch-yaml

dantegd added 18 commits April 21, 2026 18:09

FEA first commit for YAML-driven benchmark suite configuration and ex…

cda21e7

…ecution

ENH improve validation of required runtime fields after resolving ben…

1b5625c

…chmark defaults

FIX precommit fixes

ea3c199

Merge branch 'main' into fea-bench-yaml

b9030fc

FIX PR review comments

45428aa

Merge branch 'fea-bench-yaml' of github.com:dantegd/cuml into fea-ben…

7970bb9

…ch-yaml

Merge branch 'main' into fea-bench-yaml

696d5bb

FIX PR review comments

c9e9c66

ENH use backends instead of skip flags for better expressiveness

3c15b4a

FEA Add cuml.accel integration to benchmakrs

1e380c6

FIX bug when using backend cpu,gpu and add pytests for cli args

a00bec5

FIX merge conflicts

e2237f4

ENH Improve benchmark progress output formatting

6d56116

FEA Add canonical JSON benchmark output with metadata

9fc2851

ENH Default CPU benchmark estimators to max parallelism

fec9629

FIX style fixes

6b11c2c

ENH Add package environment snapshot to benchmark JSON

1797210

DOC document json schema

5671d7b

github-actions Bot added conda conda issue Cython / Python Cython or Python issue labels Apr 28, 2026

github-actions Bot assigned dantegd Apr 28, 2026

dantegd changed the title ~~Add canonical JSON output for benchmark results~~ Add JSON output for benchmark results Apr 28, 2026

dantegd mentioned this pull request Apr 28, 2026

[TRACKER] cuml.benchmark tool improvements #8028

Open

5 tasks

dantegd added 6 commits April 30, 2026 12:03

Merge branch 'fea-bench-yaml-njobs' into fea-bench-json-output

d4945eb

Merge branch 'release/26.06' into fea-bench-yaml

13ab838

ENH Make YAML optional dependency

115e23f

DOC Improvements and JSON schema

0744368

ENH Add JSON schema and pytest for validation

c0c31ee

FEA Use msgspec

5a5f421

dantegd added 8 commits May 19, 2026 14:54

Merge main

fec927d

Merge branch 'main' into fea-bench-yaml

52d0a52

Merge fea-bench-yaml

59ab820

FIX style fixes

77d628e

Merge branch 'main' into fea-bench-yaml

cc3d92c

FIX small pytest fix

4e0d805

Merge branch 'fea-bench-yaml' of github.com:dantegd/cuml into fea-ben…

c1d9d97

…ch-yaml

Merge branch 'fea-bench-yaml' into fea-bench-json-output

86f8417

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JSON output for benchmark results#8027

Add JSON output for benchmark results#8027
dantegd wants to merge 32 commits into
rapidsai:mainfrom
dantegd:fea-bench-json-output

dantegd commented Apr 28, 2026

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dantegd commented Apr 28, 2026

Why JSON Output

JSON Schema

Highlights

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants