See
docs/PRD.mdΒ§3 for the full system architecture diagram and rationale. Seedocs/DECISIONS.mdfor Architecture Decision Records (ADRs).
screw-agents is a modular, AI-powered secure code review system. It provides dedicated, vulnerability-specific agents that carry deeply researched security knowledge and are invocable from Claude Code, Codex, Gemini, local assistants, Neovim (via screw.nvim), web application workers, or CI/CD pipelines through a shared MCP server backbone.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Consumers: Claude Code β Codex/Gemini/local β screw.nvim β CI/CD β
β MCP Protocol β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β screw-agents-mcp (MCP Server) β β
β β β β
β β Agent Registry β YAML definitions (domains/) β β
β β Target Resolver (tree-sitter, 10 languages) β β
β β Output Formatter (JSON / SARIF / Markdown) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Benchmark Evaluator (benchmarks/runner/) β
β Autoresearch Loop (Phase 4) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Decision | Why |
|---|---|
| MCP server, not embedded prompts | One source of truth β improve an agent once, every client benefits |
| CWE-1400 taxonomy backbone | Only classification with completeness + mutual exclusivity + practical granularity (ADR-002) |
| YAML agent definitions | New vulnerability types via YAML, no Python code changes. Community-extensible |
| CWE-1400-native benchmark evaluator | Score in CWE-1400 directly, not CWE-1000 with translation (ADR-013) |
| PrimeVul methodology | Without dedup + chrono splits, LLM evaluation drops from 68% to 3% F1 |
This is the most important architectural concept to understand. The system is designed so that adding a new vulnerability type is a content operation, not an infrastructure operation.
PER-VULNERABILITY (repeat for each new vuln):
βββ Phase 0: Knowledge Research β research, synthesize, write agent YAML
βββ Phase 2-3: Agent authoring β subagent wrappers, skills, testing
(one-line _active_cwes.py edit to light up benchmarks)
ONE-TIME INFRASTRUCTURE (build once, benefits all vulns):
βββ Phase 0.5: Benchmark infrastructure β evaluator, ingest harness, datasets
βββ Phase 1: MCP server β registry, resolver, formatter
βββ Phase 4: Autoresearch loop β self-improvement, experiment logging
βββ Phase 5: Multi-LLM challenger β provider-agnostic disagreement analysis
βββ Phase 6: Agent expansion tooling β CI/CD, community workflow
This is the realized workflow, demonstrated by the path_traversal / CWE-22
pilot shipped in Move 0 PR2 (branch move0-pr2-path-traversal-pilot,
2026-05-26).
Step 1 β Research (per-vuln knowledge work):
Author the agent YAML in domains/<cwe-1400-category>/<agent>.yaml following
the same Tier 1-4 research methodology used for the original four agents. For
the pilot, this produced domains/file-handling/path_traversal.yaml plus the
new domains/file-handling/_domain.yaml carrying the CWE-1404 / OWASP A01
domain metadata that get_capabilities reports.
Step 2 β Register with benchmarks (one-line edit, deferred for the pilot):
Add the agent's primary CWE (e.g., "CWE-22") to the ACTIVE_CWES frozenset
in benchmarks/scripts/_active_cwes.py. Re-run the existing ingest scripts β
CrossVul, MoreFixes, reality-check, etc. already contain CWE-22 data; it is
just filtered out until the CWE is in the active set. For Move 0 PR2 this
step is intentionally deferred to Move 1: the agent ships with a
not-yet-benchmarked marker (same precedent as ADR-014 Rust deferral), and
Move 1 closes the calibration. See docs/DECISIONS.md
ADR-PILOT-BENCHMARK-DEFERRED.
Step 3 β MCP registration (automatic):
The agent registry (Phase 1) discovers YAML files in domains/ and registers
them. Catalog discovery via get_capabilities (Move 0 PR1) surfaces the new
agent and (if newly populated) the new domain. No Python code changes β drop
the YAML pair, restart the server. Move 0 PR2 proved this end-to-end:
get_capabilities now lists path_traversal and the file-handling domain
with zero Python changes, and list_agents / list_domains / Finding /
findings-JSON shapes are unchanged.
Step 4 β Validation (unified zero-touch CLI, PR-D5 D5.7):
Run uv run screw-agents onboard-agent --id <agent> from the repository root.
This single command validates the YAML, checks ACTIVE_CWES membership, runs
the four registry-driven ingest scripts (which auto-materialize upstream
sources for new CWEs per PR-D3), executes the zero-touch smoke suite, and
prints a per-language coverage report with meta.benchmark_status
suggestions. See AGENT_ONBOARDING_RUNBOOK.md
for the full 7-section procedure including coverage thresholds and common
errors. The runner, metrics, dedup, splits, and report generator all work
without modification β they are CWE-agnostic by design.
The autoresearch planner discovers the new agent automatically. Since Move 1
Task 3.5 (docs/specs/2026-05-26-move1-task3.5-registry-driven-planner-design.md),
screw_agents.autoresearch.planner.build_run_plan enumerates
(agent, dataset, primary_cwe) tuples from two sources running side-by-side:
the legacy hardcoded G5_GATES list in benchmarks/runner/gate_checker.py
(Phase-4 retrospective closure criteria for the original 4 agents) AND a new
registry-driven path that reads AgentRegistry.agents.values() +
ACTIVE_CWES + per-case cwe_ids[] from manifests. Dedupe rule:
gate-derived wins on (agent, dataset) overlap (preserves Phase-4 semantics).
For agents WITHOUT a G5 gate β path_traversal and every future catalog
agent β the registry path supplies the case enumeration. No G5_GATES
entry is required for Phase-6+ agents; their YAML's meta.cwes.primary plus
the ACTIVE_CWES flip is sufficient. Selection strategies
priority-stratified and expanded-stratified consume the registry-derived
enumeration via pseudo-gates synthesized in
controlled_run._registry_pseudo_gates. gate-order and
required-dataset-smoke stay gate-only (Phase-4-pure).
The benchmark evaluator's case-to-agent routing is registry-driven too.
Since Move 1 Task 3.6
(docs/specs/2026-05-26-move1-task3.6-registry-driven-evaluator-design.md),
benchmarks/runner/evaluator.py::map_case_to_agent uses an
lru_cache(maxsize=1)-decorated _build_cwe_to_agent_map() two-pass
builder reading agent.meta.cwes.primary (Pass 1, authoritative β registry
invariants ensure primaries don't collide between agents) and
agent.meta.cwes.related (Pass 2, gap-fill via dict.setdefault). The
old hardcoded _CWE_TO_AGENT dict that covered only the 4 original agents
is deleted; Phase-4 routing behavior (CWE-94 β ssti) is preserved
because CWE-94 is in ssti.yaml::meta.cwes.related and no agent
declares it as primary β Pass 2 fills the gap. Any future catalog agent
auto-routes via its YAML's meta.cwes.primary. Same scale-proof
fixture-based test pattern as the planner.
The 8-stage per-agent lifecycle (research β YAML β benchmarks β refinement β status promotion) is documented end-to-end in
AGENT_LIFECYCLE.md(conceptual) andAGENT_AUTHORING.md(developer procedure, stage by stage). The diagram and flow below describe the infrastructure that makes Stages 3β4 of that lifecycle zero-touch.
Move 1.5 PREP (docs/PHASE_6_MOVE_1_5_PREP_PLAN.md, PR-D1 through PR-D6)
closed every remaining structural blocker so a new agent reaches a passing
end-to-end scan via YAML edits + a one-line ACTIVE_CWES append β NO
Python source edit, plugin markdown edit, or benchmark script edit
required. The realized flow is:
- YAML drop. Agent author creates
domains/<domain-id>/<agent-id>.yaml(anddomains/<domain-id>/_domain.yamlif the domain is new). YAML declaresmeta.cwes.primary, supported languages on eachHeuristicEntry, and optional declarative fields (meta.aliases,meta.benchmark.reviewer_flags,meta.benchmark.priority_thresholds,meta.benchmark.needs_related_context, plus the 4 catalog metadata fieldssupported_target_types/provider_mode_support/source_egress/ui_hints). - ACTIVE_CWES append. Author adds the agent's primary CWE to the
ACTIVE_CWESfrozenset inbenchmarks/scripts/_active_cwes.pyβ the single Python edit a new agent author MUST perform (this is the project-wide join point between agent content and benchmark infrastructure). onboard-agentunified runner. Author runsuv run screw-agents onboard-agent --id <agent-id>from the repository root. The CLI subcommand (PR-D5 D5.7) chains six steps: (a) YAML validation, (b) ACTIVE_CWES membership check, (c) the four registry-driven ingest scripts (ingest_ossf,ingest_reality_check_{python,java,csharp}) β which auto-materialize upstream sources for new CWEs (PR-D3), (d) the zero-touch smoke suite (tests/test_zero_touch_*_e2e.py), (e) per-language coverage report withmeta.benchmark_statussuggestions, (f) exit code propagation. Seedocs/AGENT_ONBOARDING_RUNBOOK.mdfor the full 7-section procedure.- Registry auto-discovery.
AgentRegistry._load()alphabetically walksdomains/**/*.yaml, skips_*.yaml(reserved for domain metadata), and constructs anAgentper YAML. The new agent is immediately discoverable. - Discovery aliases.
meta.aliasessurfaces inget_capabilities()["agents"][i]["aliases"](PR-D4). Plugin-layer skills (Claude Codescrew-review+ Codex mirror) route user phrasing (e.g. "SQLi", "directory traversal", "zip-slip") against this list viamcp__screw-agents__list_agentsβ no hardcoded mapping tables. - Catalog surface auto-expand.
get_capabilities()returns the new agent in itsagents[]array;list_agents/list_domainscontinue to report the new entry (with shape preserved). The catalog is registry-driven end-to-end. - Plugin layer auto-expand. Claude Code skills + Codex skills
render the updated agent list dynamically from
mcp__screw-agents__list_agents. The universalscrew-scansubagent handles all registered agents β no new subagent .md file required (T-SCAN-REFACTOR collapsed 5 per-vuln + per-domain subagents into one universal runner). - No Python edit required. The only Python touch in the entire flow
is the one-line
ACTIVE_CWESappend (step 2). All other behavior derives from YAML declarations + registry-driven helpers.
The acceptance gate (PR-D5 first-time-green; PR-D6 reconfirmed) is the
18/18 smoke-test assertions in tests/test_zero_touch_agent_add_e2e.py
and tests/test_zero_touch_domain_add_e2e.py (9 pipeline steps Γ 2
synthetic fixtures = 18 assertions). The gate runs in CI on every PR; if
any step regresses to require manual intervention, CI fails loudly.
The full Move 1.5 PREP cumulative deliverable set is captured in
docs/PROJECT_STATUS.md "Phase 6 / Move 1.5 PREP β CLOSED" and the
SynApSec re-handover at docs/SYNAPSEC_HANDOVER.md (2026-05-28 PR-D6
entry).
The central active-CWE registry (benchmarks/scripts/_active_cwes.py) is the single join point between one-time infrastructure and per-vuln content:
# benchmarks/scripts/_active_cwes.py
ACTIVE_CWES: frozenset[str] = frozenset({
"CWE-78", # OS Command Injection
"CWE-79", # Cross-Site Scripting
"CWE-89", # SQL Injection
"CWE-94", # Code Injection
"CWE-1336", # SSTI
# Phase 6+ additions (Move 1 onward):
"CWE-22", # Path Traversal β YAML shipped Move 0 PR2; benchmark wired in Move 1 PR-B
# "CWE-918", # SSRF β Move 2 candidate
# ...
})Every ingest script, the dedup pipeline, and the MoreFixes extractor import from this single module. Adding a CWE here unlocks it across the entire benchmark system. The Move 0 PR2 path_traversal pilot intentionally ships ahead of this step: agent YAML lands first (so the catalog and get_capabilities reflect it) and the matching CWE-22 line is added in Move 1 alongside focused calibration.
Scan tools:
scan_agents(agents, target, ...)β paginated multi-agent primitive. Cursor binding(target_hash, agents_hash). Returns init-page withagents_excluded_by_relevance+ code-pages with per-agent prompts.scan_domain(domain, target, ...)β convenience wrapper overscan_agents. Resolves all agents in a CWE-1400 domain.
Discovery tools:
list_agents(domain=None)β enumerate registered agents (optionally filtered by domain). Shape frozen (SynApSec compatibility guard).list_domains()β enumerate domains. Shape frozen (SynApSec compatibility guard).get_agent_prompt(agent_name)β fetch the per-agent core prompt on demand (lazy fetch from subagents).get_capabilities()β versioned, machine-readable capability catalog (Move 0, 2026-05-25). No args. The single rich discovery source; additive to the frozenlist_agents/list_domains. See "Capability Catalog" below.
Capability Catalog (Move 0): get_capabilities returns a versioned envelope
(catalog_schema_version + artifact_schema_version + informational
screw_agents_version) wrapping a global_capabilities block (target types,
output formats, provider modes with source-egress/billing facts) and enriched
domains[] / agents[] catalogs. Per-domain metadata (display name, description,
CWE-1400 category, OWASP 2025) is sourced from domains/<domain>/_domain.yaml
(the registry skips all _*.yaml from agent loading and, at serve time via
create_server, refuses to start if a populated domain lacks _domain.yaml).
This is assembled in src/screw_agents/capabilities.py. list_agents/list_domains
stay byte-stable; get_capabilities is where richer catalog fields grow
additively. New YAML agents/domains appear here with no client code change.
Accumulator tools (Phase 3a X1-M1 β paired with finalize_scan_results; called on every scan, not just adaptive flows):
accumulate_findingsβ appends finding records to the active session keyed bysession_id. Phase 3a X1-M1 introduced this as the generic per-page consumer for the lazy-fetch pagination flow.
Adaptive tools (Phase 3b):
record_context_required_match,detect_coverage_gaps,lint_adaptive_script,stage_adaptive_script,promote_staged_script,reject_staged_script,execute_adaptive_script,verify_trust.
Slash-command parser:
resolve_scope(scope_text)β Task 8 helper; returns{agents, summary}. Used by/screw:scanto translate user input into an agent list. Closed allowlist (registry lookup) + no shell evaluation.
Output:
finalize_scan_results(session_id, formats=...)β emit JSON/Markdown/SARIF/CSV reports. Default format list as of T19-M D7:["json", "markdown", "csv"].record_exclusion,check_exclusionsβ exclusion learning surface (Phase 2).
Retired (T-SCAN-REFACTOR Task 6):
scan_fullβ replaced byscan_agents(agents=list_agents().names, ...)(or by the slash command'sfullkeyword).scan_<name>per-agent tools (sqli/cmdi/ssti/xss) β replaced byscan_agents(agents=[<name>], ...).
screw-scan.mdβ universal scan runner (~559 LOC). Replaces 5 deleted per-vuln + per-domain subagents (screw-sqli, screw-cmdi, screw-ssti, screw-xss, screw-injection β Task 7 of T-SCAN-REFACTOR). Dispatched withagents: list[str]from main session.screw-script-reviewer.mdβ adaptive script review. Dispatched by main session perpending_reviewschain (chain-subagents architecture, Phase 3b-C2).screw-learning-analyst.mdβ learning-mode analyst (Phase 3a).
Subagents do NOT dispatch other subagents (Claude Code constraint, sub-agents.md:711). Main session is the sole orchestrator.
/screw:scan <scope-spec> <target> [--adaptive | --no-confirm | --thoroughness <L>] [--format <F>] [--primary-provider <provider> --primary-transport <transport> --primary-execution fixture|cli] [--parallel-providers provider:transport:execution,...] [--challenger <mode> --challenger-execution dry_run|cli]
--adaptive and --no-confirm are mutually exclusive (adaptive mode requires interactive consent).
Scope-spec forms (mutually exclusive):
- Bare-token: single agent name (e.g.,
sqli) or domain name (e.g.,injection-input-handling). Disambiguated via registry lookup; theagent name β domain nameinvariant guarantees uniqueness. fullkeyword: all registered agents (post-relevance-filter).- Prefix-key:
domains:foo,bar agents:baz,quxβ combine multiple domains and agents in one invocation.
Examples:
/screw:scan sqli src/api/ # single agent
/screw:scan injection-input-handling src/ # whole domain
/screw:scan full . # all agents
/screw:scan agents:sqli,xss src/api/ # subset across domains
/screw:scan domains:foo agents:baz src/ # mix
/screw:scan domains:A,B agents:1A,2A,1B src/ # subset of A + subset of B
/screw:scan sqli src/api/ --challenger claude_primary_codex_challenger --challenger-execution dry_run
/screw:scan sqli src/api/ --primary-provider codex --primary-transport cli --primary-execution cli
/screw:scan sqli src/api/ --parallel-providers claude:cli:cli,codex:cli:cli
slash command resolve_scope scan_agents (init page)
β β β
main session βββββββββββββββββββββββββββββββββββ pre-execution summary
β β
β user consent (or --no-confirm)
β β
dispatch screw-scan βββββββββββββββββββββββββββββββββββββββββββ scan_agents (code pages)
β β
β accumulate_findings
β β
parse return (C2 + enrichment) ββββββββββββββββββββββββββββββββ return structured payload
β
optionally chain screw-script-reviewer (per pending_reviews)
β
finalize_scan_results
β
report (JSON, Markdown, SARIF, CSV per --format)
YAML files carrying vulnerability-specific detection knowledge. Each agent includes:
meta: CWE IDs, CAPEC mappings, OWASP Top 10:2025 overlay, research sourcescore_prompt: Distilled detection knowledge (2,000-4,000 tokens)detection_heuristics: Language-specific patterns at high/medium/context-required severitybypass_techniques: Real-world evasion patterns grounded in CVEsremediation: Per-language fix guidancefew_shot_examples: Vulnerable + safe code pairstarget_strategy: Tree-sitter queries for function/class targeting
See docs/PRD.md Β§4 for the full schema.
Python MCP server exposing agent definitions as tools. Phase 1 builds:
- Agent Registry: YAML loader β Pydantic validation β MCP tool registration
- Target Resolver: tree-sitter AST parsing (10 languages) + glob file discovery + git diff parsing
- Output Formatter: Findings β JSON + SARIF + Markdown
Both stdio (Claude Code) and streamable HTTP (screw.nvim, CI/CD) transports.
CWE-1400-native Python evaluator (ADR-013). Components:
| Module | Responsibility |
|---|---|
models.py |
Pydantic types: Finding, BenchmarkCase, MetricSet, Summary |
sarif.py |
bentoo-SARIF read/write (SARIF 2.1.0 subset) |
cwe.py |
CWE-1400 hierarchy traversal with strict/broad match modes |
metrics.py |
Pair-based TPR/FPR/precision/recall/F1 computation |
primevul.py |
Tree-sitter AST normalization, SHA-256 dedup, chrono/cross-project splits |
report.py |
Markdown report rendering |
cli.py |
python -m benchmarks.runner entry point |
Reusable IngestBase abstract class with 8 dataset-specific subclasses. Each ingest script:
- Downloads/clones the dataset (
ensure_downloaded()) - Parses the native format, filters to
ACTIVE_CWES(extract_cases()) - Writes bentoo-SARIF truth files + provenance manifest (base class
run())
| Dataset | Languages | Ingest Script |
|---|---|---|
| OpenSSF CVE Benchmark | JS/TS | ingest_ossf.py |
| reality-check (C#/Python/Java) | C#, Python, Java | ingest_reality_check_*.py |
| go-sec-code-mutated | Go | ingest_go_sec_code.py |
| skf-labs-mutated | Python | ingest_skf_labs.py |
| CrossVul | PHP, Ruby | ingest_crossvul.py |
| Vul4J | Java | ingest_vul4j.py |
| MoreFixes | All (via Postgres) | morefixes_extract.py |
Source corpora come in two fundamentally different shapes, and the materialization path differs by shape (PR-D3 Task D3.3 PATH D).
| Model | Shape | Materialization path |
|---|---|---|
| Per-CWE | One upstream git repo per CVE β hundreds of separate clones, driven per-CWE. | benchmarks/scripts/_materialization.py::materialize_for_cwe(cwe_id, "ossf") β idempotent clone driver with tracking file (benchmarks/external/.materialized.json). Per-clone commit pins come from each case's prePatch/postPatch SHAs in the CVE metadata. |
| Monolithic | ONE upstream git repo for the entire corpus (markup files for every project). | IngestBase.ensure_downloaded() β clones the single upstream repo once at the _PINNED_REF module constant declared in each ingest script. No per-CWE materialization is meaningful (the single repo already covers all CVEs/CWEs). |
OSSF is per-CWE. reality-check (python/java/csharp) is monolithic and all three
ingests share the same upstream repo (flawgarden/reality-check). The
Corpus = Literal["ossf"] type alias in _materialization.py intentionally
excludes the monolithic corpora β forcing them through per-CWE materialization
would iterate a single-repo corpus once per CWE, which is pure waste.
Both paths pin upstream by 40-char hex SHA: per-CWE pins live in the CVE
metadata; monolithic pins live in the _PINNED_REF module constant of each
ingest script (benchmarks/tests/test_upstream_pins.py enforces SHA shape).
Thin orchestration wrappers calling MCP tools. The current implementation is a
shared assistant plugin directory with Claude Code metadata
(.claude-plugin/plugin.json) and Codex metadata
(.codex-plugin/plugin.json, .agents/plugins/marketplace.json). The
slash-command names, agent roles, host skills, and MCP tool workflows define a
portable assistant command contract. Future Gemini, local assistant, editor, or
web-worker integrations should preserve the same command semantics and map
host-specific UX onto the same backend tools.
- Subagents (
agents/):screw-scan.md(universal scan runner; T-SCAN-REFACTOR collapsed 5 per-vuln + per-domain subagents into this one),screw-script-reviewer.md(adaptive review chain),screw-learning-analyst.md(learning mode). Main session orchestrates dispatch (chain-subagents architecture). - Claude skills (
skills/): Claude-native auto-invocation helpers for review and research workflows. - Codex skills (
codex-skills/): Codex workflow skills. Codex-only scan/learn/adaptive skills live outside the top-levelskills/directory so Claude Code does not expose duplicate slash completions for command workflows. - Slash commands: User-facing entry points (for example
/screw:scan,/screw:learn-report, and/screw:adaptive-cleanup; see "Tool & Subagent Inventory" above) - MCP config: repo-root
.mcp.jsonfor project-scoped Claude Code development, plus plugin-scopedplugins/screw/codex-mcp.jsonfor repo-local Codex marketplace development.
Per-project persistent state in the target repository:
findings/: Scan resultslearning/exclusions.yaml: False positive patterns (Phase 2)custom-scripts/: Adaptive analysis scripts (Phase 3)
CWE-1400 (Comprehensive Categorization) is the structural backbone β 21 categories consolidated to 18 agent domains. Every finding carries a CWE ID as the universal join key.
OWASP Top 10:2025 is the risk communication overlay β used in reports and user-facing output, not as the domain structure.
See docs/PRD.md Β§9 for the full taxonomy mapping and docs/DECISIONS.md ADR-002/ADR-003 for the rationale.
| Phase | Type | Focus | Status |
|---|---|---|---|
| Phase 0 | Per-vuln | Knowledge Research | Complete (4 agents) |
| Phase 0.5 | One-time | Benchmark Infrastructure | Complete |
| Phase 1 | One-time | MCP Server + Registry + Resolver + Formatter | Complete |
| Phase 2 | Per-vuln | Claude Code Integration (subagents, skills, FP learning) | Complete |
| Phase 3 | One-time | Adaptive Analysis & Learning | Complete (Phase 3a + Phase 3b + Phase 3b-C2) |
| Phase 4 | One-time | Autoresearch & Self-Improvement | Complete |
| Phase 5 | One-time | Multi-LLM Challenger + Provider-Neutral Primary Scanning | Complete β challenger/reconciliation/report surfaces, primary scan contracts, fixture validation, scan input assembly, backend CLI primary scanner runner plumbing, production Claude/Codex output normalization, package CLI, MCP surface, optional provider-scan finalization, backend composed primary-plus-challenger workflow, backend parallel primary reconciliation/finalization, mode-aware provider report filenames/metadata, universal /screw:scan provider-primary command contract, route-equivalent fixture validation, real Claude Code/Codex host-route fixture validation, one live Codex/Claude benchmark round trip, live composed validation in both Claude/Codex directions, and live parallel validation are implemented/recorded; additional provider adapters are accepted deferrals; see docs/PHASE_5_CLOSURE_READINESS.md |
| Phase 5.5 | One-time | Web application integration pilot | Next β handoff in docs/PHASE_5_5_WEB_APP_INTEGRATION.md |
| Phase 6 | Mixed | Small-batch agent expansion | Pending |
| Phase 7 | One-time | screw.nvim Integration | Pending |
See docs/PRD.md Β§12 for detailed phase descriptions and docs/PROJECT_STATUS.md for current state.
docs/PRD.mdβ Product Requirements Document (definitive)docs/DECISIONS.mdβ Architecture Decision Recordsdocs/PHASE_0_5_PLAN.mdβ Phase 0.5 implementation plan (28 tasks)docs/PHASE_0_5_VALIDATION_GATES.mdβ Phase 1.7 acceptance criteriadocs/PHASE_5_PLAN.mdβ Phase 5 challenger provider and transport plandocs/PHASE_5_MANUAL_VALIDATION.mdβ Phase 5 manual round-trip validation evidencedocs/PROJECT_STATUS.mdβ Current state + deferred obligationsdocs/KNOWLEDGE_SOURCES.mdβ Research targets for knowledge sprintdocs/AGENT_AUTHORING.mdβ Guide for writing new agent YAMLs