This folder records benchmark-specific integration contracts that live
outside agent_base so the core harness stays generic, lightweight, and
fair across different evaluations.
| Benchmark | Directory | Tracked contract |
|---|---|---|
| ResearchClawBench | benchmarks/ResearchClawBench/ |
README.md + role_prompt.md + adapter.py |
| QA / VQA-style benchmarks | benchmarks/QA/ |
README.md + role_prompt.md |
| SGI-DeepResearch | benchmarks/SGI-DeepResearch/ |
README.md + role_prompt.md |
| SGI-IdeaGeneration | benchmarks/SGI-IdeaGeneration/ |
README.md + role_prompt.md |
| SGI-DryExperiment | benchmarks/SGI-DryExperiment/ |
README.md + role_prompt.md |
| SGI-Reasoning | benchmarks/SGI-Reasoning/ |
README.md + role_prompt.md |
| SGI-WetExperiment | benchmarks/SGI-WetExperiment/ |
README.md + role_prompt.md |
agent_base/stays focused on the reusable harness runtime.- Benchmark-specific prompts, adapters, and integration notes should live under their own benchmark subdirectory.
- Local benchmark helpers may exist for private experimentation, but they do not define the formal external integration contract.