Skip to content

Latest commit

 

History

History

README.md

Benchmarks

This folder records benchmark-specific integration contracts that live outside agent_base so the core harness stays generic, lightweight, and fair across different evaluations.

Benchmark Directory Tracked contract
ResearchClawBench benchmarks/ResearchClawBench/ README.md + role_prompt.md + adapter.py
QA / VQA-style benchmarks benchmarks/QA/ README.md + role_prompt.md
SGI-DeepResearch benchmarks/SGI-DeepResearch/ README.md + role_prompt.md
SGI-IdeaGeneration benchmarks/SGI-IdeaGeneration/ README.md + role_prompt.md
SGI-DryExperiment benchmarks/SGI-DryExperiment/ README.md + role_prompt.md
SGI-Reasoning benchmarks/SGI-Reasoning/ README.md + role_prompt.md
SGI-WetExperiment benchmarks/SGI-WetExperiment/ README.md + role_prompt.md

Notes

  • agent_base/ stays focused on the reusable harness runtime.
  • Benchmark-specific prompts, adapters, and integration notes should live under their own benchmark subdirectory.
  • Local benchmark helpers may exist for private experimentation, but they do not define the formal external integration contract.