docs(benchmarks): add generation benchmarks#239
Conversation
|
Hi there! Can you please enhance my benchmarks package with your code and share the tool results? As a hint, the final generation result that gets embedded in |
|
Oh, ok... I did this PR just because in your comment to the issue #207 you wrote: A write‑up or summary table we can link to, and |
|
I see, sorry, missed that. Could you add the generation benchmarks (tho in Python, no problem) to this repo as well? For the sake of reproducibility? Thanks. |
|
Sure, I’ll open a PR adding it under benchmarks/generation |
The doc section is well-written and the results are compelling. Holding the merge for one reason. We'd like benchmarks documented here to be reproducible from this repo, and the harness behind these numbers lives only in your external Python repo (see the close note on #240). Once a TS port of the harness lands, this section ships alongside it. Marking as draft so it doesn't pile up review-ready signals in the meantime – please flip back to ready when the port is in. Thanks for the rigorous methodology on the harness side. |
Linked Issue
Closes #207
Description
This PR adds Generation Benchmarks section to the documentation. It details the performance of TOON compared to JSON and JSON Structured Output (JSO) across 21 different LLMs, focusing on token efficiency, accuracy, and repair capabilities.
Type of Change
Changes Made
## 2. Generation benchmarkssection todocs/guide/benchmarks.md.SPEC Compliance
Testing
Pre-submission Checklist
Breaking Changes
Additional Context
Benchmarks were run via the Nebius API.