Architecture — screw-agents

See docs/PRD.md §3 for the full system architecture diagram and rationale. See docs/DECISIONS.md for Architecture Decision Records (ADRs).

System Overview

screw-agents is a modular, AI-powered secure code review system. It provides dedicated, vulnerability-specific agents that carry deeply researched security knowledge and are invocable from Claude Code, Codex, Gemini, local assistants, Neovim (via screw.nvim), web application workers, or CI/CD pipelines through a shared MCP server backbone.

┌──────────────────────────────────────────────────────────────┐
│  Consumers: Claude Code │ Codex/Gemini/local │ screw.nvim │ CI/CD │
│                         MCP Protocol                         │
│  ┌────────────────────────────────────────────────────────┐  │
│  │              screw-agents-mcp (MCP Server)             │  │
│  │                                                        │  │
│  │   Agent Registry ← YAML definitions (domains/)        │  │
│  │   Target Resolver (tree-sitter, 10 languages)         │  │
│  │   Output Formatter (JSON / SARIF / Markdown)          │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  Benchmark Evaluator (benchmarks/runner/)                    │
│  Autoresearch Loop (Phase 4)                                 │
└──────────────────────────────────────────────────────────────┘

Key Design Decisions

Decision	Why
MCP server, not embedded prompts	One source of truth — improve an agent once, every client benefits
CWE-1400 taxonomy backbone	Only classification with completeness + mutual exclusivity + practical granularity (ADR-002)
YAML agent definitions	New vulnerability types via YAML, no Python code changes. Community-extensible
CWE-1400-native benchmark evaluator	Score in CWE-1400 directly, not CWE-1000 with translation (ADR-013)
PrimeVul methodology	Without dedup + chrono splits, LLM evaluation drops from 68% to 3% F1

Phase Lifecycle: One-Time Infrastructure vs Per-Vulnerability Work

This is the most important architectural concept to understand. The system is designed so that adding a new vulnerability type is a content operation, not an infrastructure operation.

The Two Types of Work

 PER-VULNERABILITY (repeat for each new vuln):
 ├── Phase 0:   Knowledge Research — research, synthesize, write agent YAML
 └── Phase 2-3: Agent authoring — subagent wrappers, skills, testing
                 (one-line _active_cwes.py edit to light up benchmarks)

 ONE-TIME INFRASTRUCTURE (build once, benefits all vulns):
 ├── Phase 0.5: Benchmark infrastructure — evaluator, ingest harness, datasets
 ├── Phase 1:   MCP server — registry, resolver, formatter
 ├── Phase 4:   Autoresearch loop — self-improvement, experiment logging
 ├── Phase 5:   Multi-LLM challenger — provider-agnostic disagreement analysis
 └── Phase 6:   Agent expansion tooling — CI/CD, community workflow

How a New Vulnerability Plugs In

This is the realized workflow, demonstrated by the path_traversal / CWE-22 pilot shipped in Move 0 PR2 (branch move0-pr2-path-traversal-pilot, 2026-05-26).

Step 1 — Research (per-vuln knowledge work): Author the agent YAML in domains/<cwe-1400-category>/<agent>.yaml following the same Tier 1-4 research methodology used for the original four agents. For the pilot, this produced domains/file-handling/path_traversal.yaml plus the new domains/file-handling/_domain.yaml carrying the CWE-1404 / OWASP A01 domain metadata that get_capabilities reports.

Step 2 — Register with benchmarks (one-line edit, deferred for the pilot): Add the agent's primary CWE (e.g., "CWE-22") to the ACTIVE_CWES frozenset in benchmarks/scripts/_active_cwes.py. Re-run the existing ingest scripts — CrossVul, MoreFixes, reality-check, etc. already contain CWE-22 data; it is just filtered out until the CWE is in the active set. For Move 0 PR2 this step is intentionally deferred to Move 1: the agent ships with a not-yet-benchmarked marker (same precedent as ADR-014 Rust deferral), and Move 1 closes the calibration. See docs/DECISIONS.md ADR-PILOT-BENCHMARK-DEFERRED.

Step 3 — MCP registration (automatic): The agent registry (Phase 1) discovers YAML files in domains/ and registers them. Catalog discovery via get_capabilities (Move 0 PR1) surfaces the new agent and (if newly populated) the new domain. No Python code changes — drop the YAML pair, restart the server. Move 0 PR2 proved this end-to-end: get_capabilities now lists path_traversal and the file-handling domain with zero Python changes, and list_agents / list_domains / Finding / findings-JSON shapes are unchanged.

Step 4 — Validation (unified zero-touch CLI, PR-D5 D5.7): Run uv run screw-agents onboard-agent --id <agent> from the repository root. This single command validates the YAML, checks ACTIVE_CWES membership, runs the four registry-driven ingest scripts (which auto-materialize upstream sources for new CWEs per PR-D3), executes the zero-touch smoke suite, and prints a per-language coverage report with meta.benchmark_status suggestions. See AGENT_ONBOARDING_RUNBOOK.md for the full 7-section procedure including coverage thresholds and common errors. The runner, metrics, dedup, splits, and report generator all work without modification — they are CWE-agnostic by design.

The autoresearch planner discovers the new agent automatically. Since Move 1 Task 3.5 (docs/specs/2026-05-26-move1-task3.5-registry-driven-planner-design.md), screw_agents.autoresearch.planner.build_run_plan enumerates (agent, dataset, primary_cwe) tuples from two sources running side-by-side: the legacy hardcoded G5_GATES list in benchmarks/runner/gate_checker.py (Phase-4 retrospective closure criteria for the original 4 agents) AND a new registry-driven path that reads AgentRegistry.agents.values() + ACTIVE_CWES + per-case cwe_ids[] from manifests. Dedupe rule: gate-derived wins on (agent, dataset) overlap (preserves Phase-4 semantics). For agents WITHOUT a G5 gate — path_traversal and every future catalog agent — the registry path supplies the case enumeration. No G5_GATES entry is required for Phase-6+ agents; their YAML's meta.cwes.primary plus the ACTIVE_CWES flip is sufficient. Selection strategies priority-stratified and expanded-stratified consume the registry-derived enumeration via pseudo-gates synthesized in controlled_run._registry_pseudo_gates. gate-order and required-dataset-smoke stay gate-only (Phase-4-pure).

The benchmark evaluator's case-to-agent routing is registry-driven too. Since Move 1 Task 3.6 (docs/specs/2026-05-26-move1-task3.6-registry-driven-evaluator-design.md), benchmarks/runner/evaluator.py::map_case_to_agent uses an lru_cache(maxsize=1)-decorated _build_cwe_to_agent_map() two-pass builder reading agent.meta.cwes.primary (Pass 1, authoritative — registry invariants ensure primaries don't collide between agents) and agent.meta.cwes.related (Pass 2, gap-fill via dict.setdefault). The old hardcoded _CWE_TO_AGENT dict that covered only the 4 original agents is deleted; Phase-4 routing behavior (CWE-94 → ssti) is preserved because CWE-94 is in ssti.yaml::meta.cwes.related and no agent declares it as primary — Pass 2 fills the gap. Any future catalog agent auto-routes via its YAML's meta.cwes.primary. Same scale-proof fixture-based test pattern as the planner.

Zero-touch onboarding pipeline (Move 1.5 PREP, closed 2026-05-28)

The 8-stage per-agent lifecycle (research → YAML → benchmarks → refinement → status promotion) is documented end-to-end in AGENT_LIFECYCLE.md (conceptual) and AGENT_AUTHORING.md (developer procedure, stage by stage). The diagram and flow below describe the infrastructure that makes Stages 3–4 of that lifecycle zero-touch.

Move 1.5 PREP (docs/PHASE_6_MOVE_1_5_PREP_PLAN.md, PR-D1 through PR-D6) closed every remaining structural blocker so a new agent reaches a passing end-to-end scan via YAML edits + a one-line ACTIVE_CWES append — NO Python source edit, plugin markdown edit, or benchmark script edit required. The realized flow is:

YAML drop. Agent author creates domains/<domain-id>/<agent-id>.yaml (and domains/<domain-id>/_domain.yaml if the domain is new). YAML declares meta.cwes.primary, supported languages on each HeuristicEntry, and optional declarative fields (meta.aliases, meta.benchmark.reviewer_flags, meta.benchmark.priority_thresholds, meta.benchmark.needs_related_context, plus the 4 catalog metadata fields supported_target_types / provider_mode_support / source_egress / ui_hints).
ACTIVE_CWES append. Author adds the agent's primary CWE to the ACTIVE_CWES frozenset in benchmarks/scripts/_active_cwes.py — the single Python edit a new agent author MUST perform (this is the project-wide join point between agent content and benchmark infrastructure).
onboard-agent unified runner. Author runs uv run screw-agents onboard-agent --id <agent-id> from the repository root. The CLI subcommand (PR-D5 D5.7) chains six steps: (a) YAML validation, (b) ACTIVE_CWES membership check, (c) the four registry-driven ingest scripts (ingest_ossf, ingest_reality_check_{python,java,csharp}) — which auto-materialize upstream sources for new CWEs (PR-D3), (d) the zero-touch smoke suite (tests/test_zero_touch_*_e2e.py), (e) per-language coverage report with meta.benchmark_status suggestions, (f) exit code propagation. See docs/AGENT_ONBOARDING_RUNBOOK.md for the full 7-section procedure.
Registry auto-discovery. AgentRegistry._load() alphabetically walks domains/**/*.yaml, skips _*.yaml (reserved for domain metadata), and constructs an Agent per YAML. The new agent is immediately discoverable.
Discovery aliases. meta.aliases surfaces in get_capabilities()["agents"][i]["aliases"] (PR-D4). Plugin-layer skills (Claude Code screw-review + Codex mirror) route user phrasing (e.g. "SQLi", "directory traversal", "zip-slip") against this list via mcp__screw-agents__list_agents — no hardcoded mapping tables.
Catalog surface auto-expand. get_capabilities() returns the new agent in its agents[] array; list_agents / list_domains continue to report the new entry (with shape preserved). The catalog is registry-driven end-to-end.
Plugin layer auto-expand. Claude Code skills + Codex skills render the updated agent list dynamically from mcp__screw-agents__list_agents. The universal screw-scan subagent handles all registered agents — no new subagent .md file required (T-SCAN-REFACTOR collapsed 5 per-vuln + per-domain subagents into one universal runner).
No Python edit required. The only Python touch in the entire flow is the one-line ACTIVE_CWES append (step 2). All other behavior derives from YAML declarations + registry-driven helpers.

The acceptance gate (PR-D5 first-time-green; PR-D6 reconfirmed) is the 18/18 smoke-test assertions in tests/test_zero_touch_agent_add_e2e.py and tests/test_zero_touch_domain_add_e2e.py (9 pipeline steps × 2 synthetic fixtures = 18 assertions). The gate runs in CI on every PR; if any step regresses to require manual intervention, CI fails loudly.

The full Move 1.5 PREP cumulative deliverable set is captured in docs/PROJECT_STATUS.md "Phase 6 / Move 1.5 PREP — CLOSED" and the SynApSec re-handover at docs/SYNAPSEC_HANDOVER.md (2026-05-28 PR-D6 entry).

The Join Point: `_active_cwes.py`

The central active-CWE registry (benchmarks/scripts/_active_cwes.py) is the single join point between one-time infrastructure and per-vuln content:

# benchmarks/scripts/_active_cwes.py
ACTIVE_CWES: frozenset[str] = frozenset({
    "CWE-78",    # OS Command Injection
    "CWE-79",    # Cross-Site Scripting
    "CWE-89",    # SQL Injection
    "CWE-94",    # Code Injection
    "CWE-1336",  # SSTI
    # Phase 6+ additions (Move 1 onward):
    "CWE-22",    # Path Traversal — YAML shipped Move 0 PR2; benchmark wired in Move 1 PR-B
    # "CWE-918", # SSRF — Move 2 candidate
    # ...
})

Every ingest script, the dedup pipeline, and the MoreFixes extractor import from this single module. Adding a CWE here unlocks it across the entire benchmark system. The Move 0 PR2 path_traversal pilot intentionally ships ahead of this step: agent YAML lands first (so the catalog and get_capabilities reflect it) and the matching CWE-22 line is added in Move 1 alongside focused calibration.

Tool & Subagent Inventory (post-T-SCAN-REFACTOR)

MCP tools (post-2026-04-25)

Scan tools:

scan_agents(agents, target, ...) — paginated multi-agent primitive. Cursor binding (target_hash, agents_hash). Returns init-page with agents_excluded_by_relevance + code-pages with per-agent prompts.
scan_domain(domain, target, ...) — convenience wrapper over scan_agents. Resolves all agents in a CWE-1400 domain.

Discovery tools:

list_agents(domain=None) — enumerate registered agents (optionally filtered by domain). Shape frozen (SynApSec compatibility guard).
list_domains() — enumerate domains. Shape frozen (SynApSec compatibility guard).
get_agent_prompt(agent_name) — fetch the per-agent core prompt on demand (lazy fetch from subagents).
get_capabilities() — versioned, machine-readable capability catalog (Move 0, 2026-05-25). No args. The single rich discovery source; additive to the frozen list_agents/list_domains. See "Capability Catalog" below.

Capability Catalog (Move 0): get_capabilities returns a versioned envelope (catalog_schema_version + artifact_schema_version + informational screw_agents_version) wrapping a global_capabilities block (target types, output formats, provider modes with source-egress/billing facts) and enriched domains[] / agents[] catalogs. Per-domain metadata (display name, description, CWE-1400 category, OWASP 2025) is sourced from domains/<domain>/_domain.yaml (the registry skips all _*.yaml from agent loading and, at serve time via create_server, refuses to start if a populated domain lacks _domain.yaml). This is assembled in src/screw_agents/capabilities.py. list_agents/list_domains stay byte-stable; get_capabilities is where richer catalog fields grow additively. New YAML agents/domains appear here with no client code change.

Accumulator tools (Phase 3a X1-M1 — paired with finalize_scan_results; called on every scan, not just adaptive flows):

accumulate_findings — appends finding records to the active session keyed by session_id. Phase 3a X1-M1 introduced this as the generic per-page consumer for the lazy-fetch pagination flow.

Adaptive tools (Phase 3b):

record_context_required_match, detect_coverage_gaps, lint_adaptive_script, stage_adaptive_script, promote_staged_script, reject_staged_script, execute_adaptive_script, verify_trust.

Slash-command parser:

resolve_scope(scope_text) — Task 8 helper; returns {agents, summary}. Used by /screw:scan to translate user input into an agent list. Closed allowlist (registry lookup) + no shell evaluation.

Output:

finalize_scan_results(session_id, formats=...) — emit JSON/Markdown/SARIF/CSV reports. Default format list as of T19-M D7: ["json", "markdown", "csv"].
record_exclusion, check_exclusions — exclusion learning surface (Phase 2).

Retired (T-SCAN-REFACTOR Task 6):

scan_full — replaced by scan_agents(agents=list_agents().names, ...) (or by the slash command's full keyword).
scan_<name> per-agent tools (sqli/cmdi/ssti/xss) — replaced by scan_agents(agents=[<name>], ...).

Subagents (post-2026-04-25)

screw-scan.md — universal scan runner (~559 LOC). Replaces 5 deleted per-vuln + per-domain subagents (screw-sqli, screw-cmdi, screw-ssti, screw-xss, screw-injection — Task 7 of T-SCAN-REFACTOR). Dispatched with agents: list[str] from main session.
screw-script-reviewer.md — adaptive script review. Dispatched by main session per pending_reviews chain (chain-subagents architecture, Phase 3b-C2).
screw-learning-analyst.md — learning-mode analyst (Phase 3a).

Subagents do NOT dispatch other subagents (Claude Code constraint, sub-agents.md:711). Main session is the sole orchestrator.

Slash command grammar (post-Task-8)

/screw:scan <scope-spec> <target> [--adaptive | --no-confirm | --thoroughness <L>] [--format <F>] [--primary-provider <provider> --primary-transport <transport> --primary-execution fixture|cli] [--parallel-providers provider:transport:execution,...] [--challenger <mode> --challenger-execution dry_run|cli]

--adaptive and --no-confirm are mutually exclusive (adaptive mode requires interactive consent).

Scope-spec forms (mutually exclusive):

Bare-token: single agent name (e.g., sqli) or domain name (e.g., injection-input-handling). Disambiguated via registry lookup; the agent name ≠ domain name invariant guarantees uniqueness.
full keyword: all registered agents (post-relevance-filter).
Prefix-key: domains:foo,bar agents:baz,qux — combine multiple domains and agents in one invocation.

Examples:

/screw:scan sqli src/api/                    # single agent
/screw:scan injection-input-handling src/    # whole domain
/screw:scan full .                           # all agents
/screw:scan agents:sqli,xss src/api/         # subset across domains
/screw:scan domains:foo agents:baz src/      # mix
/screw:scan domains:A,B agents:1A,2A,1B src/ # subset of A + subset of B
/screw:scan sqli src/api/ --challenger claude_primary_codex_challenger --challenger-execution dry_run
/screw:scan sqli src/api/ --primary-provider codex --primary-transport cli --primary-execution cli
/screw:scan sqli src/api/ --parallel-providers claude:cli:cli,codex:cli:cli

Scan flow (chain-subagents architecture)

slash command       resolve_scope        scan_agents (init page)
   ↓                    ↓                       ↓
main session ──────────────────────────────────→  pre-execution summary
   ↓                                                    ↓
   ↓                                              user consent (or --no-confirm)
   ↓                                                    ↓
dispatch screw-scan ──────────────────────────────────────────→ scan_agents (code pages)
   ↓                                                                  ↓
   ↓                                                            accumulate_findings
   ↓                                                                  ↓
parse return (C2 + enrichment) ←─────────────────────────────── return structured payload
   ↓
optionally chain screw-script-reviewer (per pending_reviews)
   ↓
finalize_scan_results
   ↓
report (JSON, Markdown, SARIF, CSV per --format)

Component Architecture

Agent Definitions (`domains/`)

YAML files carrying vulnerability-specific detection knowledge. Each agent includes:

meta: CWE IDs, CAPEC mappings, OWASP Top 10:2025 overlay, research sources
core_prompt: Distilled detection knowledge (2,000-4,000 tokens)
detection_heuristics: Language-specific patterns at high/medium/context-required severity
bypass_techniques: Real-world evasion patterns grounded in CVEs
remediation: Per-language fix guidance
few_shot_examples: Vulnerable + safe code pairs
target_strategy: Tree-sitter queries for function/class targeting

See docs/PRD.md §4 for the full schema.

MCP Server (`src/screw_agents/`)

Python MCP server exposing agent definitions as tools. Phase 1 builds:

Agent Registry: YAML loader → Pydantic validation → MCP tool registration
Target Resolver: tree-sitter AST parsing (10 languages) + glob file discovery + git diff parsing
Output Formatter: Findings → JSON + SARIF + Markdown

Both stdio (Claude Code) and streamable HTTP (screw.nvim, CI/CD) transports.

Benchmark Evaluator (`benchmarks/runner/`)

CWE-1400-native Python evaluator (ADR-013). Components:

Module	Responsibility
`models.py`	Pydantic types: Finding, BenchmarkCase, MetricSet, Summary
`sarif.py`	bentoo-SARIF read/write (SARIF 2.1.0 subset)
`cwe.py`	CWE-1400 hierarchy traversal with strict/broad match modes
`metrics.py`	Pair-based TPR/FPR/precision/recall/F1 computation
`primevul.py`	Tree-sitter AST normalization, SHA-256 dedup, chrono/cross-project splits
`report.py`	Markdown report rendering
`cli.py`	`python -m benchmarks.runner` entry point

Ingest System (`benchmarks/scripts/`)

Reusable IngestBase abstract class with 8 dataset-specific subclasses. Each ingest script:

Downloads/clones the dataset (ensure_downloaded())
Parses the native format, filters to ACTIVE_CWES (extract_cases())
Writes bentoo-SARIF truth files + provenance manifest (base class run())

Dataset	Languages	Ingest Script
OpenSSF CVE Benchmark	JS/TS	`ingest_ossf.py`
reality-check (C#/Python/Java)	C#, Python, Java	`ingest_reality_check_*.py`
go-sec-code-mutated	Go	`ingest_go_sec_code.py`
skf-labs-mutated	Python	`ingest_skf_labs.py`
CrossVul	PHP, Ruby	`ingest_crossvul.py`
Vul4J	Java	`ingest_vul4j.py`
MoreFixes	All (via Postgres)	`morefixes_extract.py`

Corpus Models: Per-CWE vs Monolithic

Source corpora come in two fundamentally different shapes, and the materialization path differs by shape (PR-D3 Task D3.3 PATH D).

Model	Shape	Materialization path
Per-CWE	One upstream git repo per CVE — hundreds of separate clones, driven per-CWE.	`benchmarks/scripts/_materialization.py::materialize_for_cwe(cwe_id, "ossf")` — idempotent clone driver with tracking file (`benchmarks/external/.materialized.json`). Per-clone commit pins come from each case's `prePatch`/`postPatch` SHAs in the CVE metadata.
Monolithic	ONE upstream git repo for the entire corpus (markup files for every project).	`IngestBase.ensure_downloaded()` — clones the single upstream repo once at the `_PINNED_REF` module constant declared in each ingest script. No per-CWE materialization is meaningful (the single repo already covers all CVEs/CWEs).

OSSF is per-CWE. reality-check (python/java/csharp) is monolithic and all three ingests share the same upstream repo (flawgarden/reality-check). The Corpus = Literal["ossf"] type alias in _materialization.py intentionally excludes the monolithic corpora — forcing them through per-CWE materialization would iterate a single-repo corpus once per CWE, which is pure waste.

Both paths pin upstream by 40-char hex SHA: per-CWE pins live in the CVE metadata; monolithic pins live in the _PINNED_REF module constant of each ingest script (benchmarks/tests/test_upstream_pins.py enforces SHA shape).

Assistant Command Integration (`plugins/screw/`)

Thin orchestration wrappers calling MCP tools. The current implementation is a shared assistant plugin directory with Claude Code metadata (.claude-plugin/plugin.json) and Codex metadata (.codex-plugin/plugin.json, .agents/plugins/marketplace.json). The slash-command names, agent roles, host skills, and MCP tool workflows define a portable assistant command contract. Future Gemini, local assistant, editor, or web-worker integrations should preserve the same command semantics and map host-specific UX onto the same backend tools.

Subagents (agents/): screw-scan.md (universal scan runner; T-SCAN-REFACTOR collapsed 5 per-vuln + per-domain subagents into this one), screw-script-reviewer.md (adaptive review chain), screw-learning-analyst.md (learning mode). Main session orchestrates dispatch (chain-subagents architecture).
Claude skills (skills/): Claude-native auto-invocation helpers for review and research workflows.
Codex skills (codex-skills/): Codex workflow skills. Codex-only scan/learn/adaptive skills live outside the top-level skills/ directory so Claude Code does not expose duplicate slash completions for command workflows.
Slash commands: User-facing entry points (for example /screw:scan, /screw:learn-report, and /screw:adaptive-cleanup; see "Tool & Subagent Inventory" above)
MCP config: repo-root .mcp.json for project-scoped Claude Code development, plus plugin-scoped plugins/screw/codex-mcp.json for repo-local Codex marketplace development.

Project-Level State (`.screw/`)

Per-project persistent state in the target repository:

findings/: Scan results
learning/exclusions.yaml: False positive patterns (Phase 2)
custom-scripts/: Adaptive analysis scripts (Phase 3)

Taxonomy

CWE-1400 (Comprehensive Categorization) is the structural backbone — 21 categories consolidated to 18 agent domains. Every finding carries a CWE ID as the universal join key.

OWASP Top 10:2025 is the risk communication overlay — used in reports and user-facing output, not as the domain structure.

See docs/PRD.md §9 for the full taxonomy mapping and docs/DECISIONS.md ADR-002/ADR-003 for the rationale.

Phase Plan Summary

Phase	Type	Focus	Status
Phase 0	Per-vuln	Knowledge Research	Complete (4 agents)
Phase 0.5	One-time	Benchmark Infrastructure	Complete
Phase 1	One-time	MCP Server + Registry + Resolver + Formatter	Complete
Phase 2	Per-vuln	Claude Code Integration (subagents, skills, FP learning)	Complete
Phase 3	One-time	Adaptive Analysis & Learning	Complete (Phase 3a + Phase 3b + Phase 3b-C2)
Phase 4	One-time	Autoresearch & Self-Improvement	Complete
Phase 5	One-time	Multi-LLM Challenger + Provider-Neutral Primary Scanning	Complete — challenger/reconciliation/report surfaces, primary scan contracts, fixture validation, scan input assembly, backend CLI primary scanner runner plumbing, production Claude/Codex output normalization, package CLI, MCP surface, optional provider-scan finalization, backend composed primary-plus-challenger workflow, backend parallel primary reconciliation/finalization, mode-aware provider report filenames/metadata, universal `/screw:scan` provider-primary command contract, route-equivalent fixture validation, real Claude Code/Codex host-route fixture validation, one live Codex/Claude benchmark round trip, live composed validation in both Claude/Codex directions, and live parallel validation are implemented/recorded; additional provider adapters are accepted deferrals; see `docs/PHASE_5_CLOSURE_READINESS.md`
Phase 5.5	One-time	Web application integration pilot	Next — handoff in `docs/PHASE_5_5_WEB_APP_INTEGRATION.md`
Phase 6	Mixed	Small-batch agent expansion	Pending
Phase 7	One-time	screw.nvim Integration	Pending

See docs/PRD.md §12 for detailed phase descriptions and docs/PROJECT_STATUS.md for current state.

Key References

docs/PRD.md — Product Requirements Document (definitive)
docs/DECISIONS.md — Architecture Decision Records
docs/PHASE_0_5_PLAN.md — Phase 0.5 implementation plan (28 tasks)
docs/PHASE_0_5_VALIDATION_GATES.md — Phase 1.7 acceptance criteria
docs/PHASE_5_PLAN.md — Phase 5 challenger provider and transport plan
docs/PHASE_5_MANUAL_VALIDATION.md — Phase 5 manual round-trip validation evidence
docs/PROJECT_STATUS.md — Current state + deferred obligations
docs/KNOWLEDGE_SOURCES.md — Research targets for knowledge sprint
docs/AGENT_AUTHORING.md — Guide for writing new agent YAMLs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture — screw-agents

System Overview

Key Design Decisions

Phase Lifecycle: One-Time Infrastructure vs Per-Vulnerability Work

The Two Types of Work

How a New Vulnerability Plugs In

Zero-touch onboarding pipeline (Move 1.5 PREP, closed 2026-05-28)

The Join Point: `_active_cwes.py`

Tool & Subagent Inventory (post-T-SCAN-REFACTOR)

MCP tools (post-2026-04-25)

Subagents (post-2026-04-25)

Slash command grammar (post-Task-8)

Scan flow (chain-subagents architecture)

Component Architecture

Agent Definitions (`domains/`)

MCP Server (`src/screw_agents/`)

Benchmark Evaluator (`benchmarks/runner/`)

Ingest System (`benchmarks/scripts/`)

Corpus Models: Per-CWE vs Monolithic

Assistant Command Integration (`plugins/screw/`)

Project-Level State (`.screw/`)

Taxonomy

Phase Plan Summary

Key References

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture — screw-agents

System Overview

Key Design Decisions

Phase Lifecycle: One-Time Infrastructure vs Per-Vulnerability Work

The Two Types of Work

How a New Vulnerability Plugs In

Zero-touch onboarding pipeline (Move 1.5 PREP, closed 2026-05-28)

The Join Point: _active_cwes.py

Tool & Subagent Inventory (post-T-SCAN-REFACTOR)

MCP tools (post-2026-04-25)

Subagents (post-2026-04-25)

Slash command grammar (post-Task-8)

Scan flow (chain-subagents architecture)

Component Architecture

Agent Definitions (domains/)

MCP Server (src/screw_agents/)

Benchmark Evaluator (benchmarks/runner/)

Ingest System (benchmarks/scripts/)

Corpus Models: Per-CWE vs Monolithic

Assistant Command Integration (plugins/screw/)

Project-Level State (.screw/)

Taxonomy

Phase Plan Summary

Key References

The Join Point: `_active_cwes.py`

Agent Definitions (`domains/`)

MCP Server (`src/screw_agents/`)

Benchmark Evaluator (`benchmarks/runner/`)

Ingest System (`benchmarks/scripts/`)

Assistant Command Integration (`plugins/screw/`)

Project-Level State (`.screw/`)