Name		Name	Last commit message	Last commit date
parent directory ..
QA		QA
ResearchClawBench		ResearchClawBench
SGI-DeepResearch		SGI-DeepResearch
SGI-DryExperiment		SGI-DryExperiment
SGI-IdeaGeneration		SGI-IdeaGeneration
SGI-Reasoning		SGI-Reasoning
SGI-WetExperiment		SGI-WetExperiment
README.md		README.md

README.md

Benchmarks

This folder records benchmark-specific integration contracts that live outside agent_base so the core harness stays generic, lightweight, and fair across different evaluations.

Benchmark	Directory	Tracked contract
ResearchClawBench	`benchmarks/ResearchClawBench/`	`README.md` + `role_prompt.md` + `adapter.py`
QA / VQA-style benchmarks	`benchmarks/QA/`	`README.md` + `role_prompt.md`
SGI-DeepResearch	`benchmarks/SGI-DeepResearch/`	`README.md` + `role_prompt.md`
SGI-IdeaGeneration	`benchmarks/SGI-IdeaGeneration/`	`README.md` + `role_prompt.md`
SGI-DryExperiment	`benchmarks/SGI-DryExperiment/`	`README.md` + `role_prompt.md`
SGI-Reasoning	`benchmarks/SGI-Reasoning/`	`README.md` + `role_prompt.md`
SGI-WetExperiment	`benchmarks/SGI-WetExperiment/`	`README.md` + `role_prompt.md`

Notes

agent_base/ stays focused on the reusable harness runtime.
Benchmark-specific prompts, adapters, and integration notes should live under their own benchmark subdirectory.
Local benchmark helpers may exist for private experimentation, but they do not define the formal external integration contract.