Skip to content

Latest commit

 

History

History

Docs

This folder documents eval authoring and benchmark execution in this repository.

For methodology details and scoring definitions, see ../paper/benchmark-methodology-whitepaper.tex.

Category status follows top-level folders under evals/: any group not present there is currently considered WIP.

If you are starting fresh, read these in order:

  1. adding-new-eval.md for eval directory contract and requirements.yaml behavior.
  2. starter-scaffold-contract.md for baseline app/ starter policy.
  3. testing-your-evals.md for focused verification before opening a PR.
  4. adding-new-category.md for category README and requirement-design workflow.

For contribution workflow, command examples, and PR conventions, see ../CONTRIBUTING.md and ../AGENTS.md.