Skip to content

Latest commit

Β 

History

History
478 lines (391 loc) Β· 28.3 KB

File metadata and controls

478 lines (391 loc) Β· 28.3 KB

Architecture β€” screw-agents

See docs/PRD.md Β§3 for the full system architecture diagram and rationale. See docs/DECISIONS.md for Architecture Decision Records (ADRs).

System Overview

screw-agents is a modular, AI-powered secure code review system. It provides dedicated, vulnerability-specific agents that carry deeply researched security knowledge and are invocable from Claude Code, Codex, Gemini, local assistants, Neovim (via screw.nvim), web application workers, or CI/CD pipelines through a shared MCP server backbone.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Consumers: Claude Code β”‚ Codex/Gemini/local β”‚ screw.nvim β”‚ CI/CD β”‚
β”‚                         MCP Protocol                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              screw-agents-mcp (MCP Server)             β”‚  β”‚
β”‚  β”‚                                                        β”‚  β”‚
β”‚  β”‚   Agent Registry ← YAML definitions (domains/)        β”‚  β”‚
β”‚  β”‚   Target Resolver (tree-sitter, 10 languages)         β”‚  β”‚
β”‚  β”‚   Output Formatter (JSON / SARIF / Markdown)          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                              β”‚
β”‚  Benchmark Evaluator (benchmarks/runner/)                    β”‚
β”‚  Autoresearch Loop (Phase 4)                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Decisions

Decision Why
MCP server, not embedded prompts One source of truth β€” improve an agent once, every client benefits
CWE-1400 taxonomy backbone Only classification with completeness + mutual exclusivity + practical granularity (ADR-002)
YAML agent definitions New vulnerability types via YAML, no Python code changes. Community-extensible
CWE-1400-native benchmark evaluator Score in CWE-1400 directly, not CWE-1000 with translation (ADR-013)
PrimeVul methodology Without dedup + chrono splits, LLM evaluation drops from 68% to 3% F1

Phase Lifecycle: One-Time Infrastructure vs Per-Vulnerability Work

This is the most important architectural concept to understand. The system is designed so that adding a new vulnerability type is a content operation, not an infrastructure operation.

The Two Types of Work

 PER-VULNERABILITY (repeat for each new vuln):
 β”œβ”€β”€ Phase 0:   Knowledge Research β€” research, synthesize, write agent YAML
 └── Phase 2-3: Agent authoring β€” subagent wrappers, skills, testing
                 (one-line _active_cwes.py edit to light up benchmarks)

 ONE-TIME INFRASTRUCTURE (build once, benefits all vulns):
 β”œβ”€β”€ Phase 0.5: Benchmark infrastructure β€” evaluator, ingest harness, datasets
 β”œβ”€β”€ Phase 1:   MCP server β€” registry, resolver, formatter
 β”œβ”€β”€ Phase 4:   Autoresearch loop β€” self-improvement, experiment logging
 β”œβ”€β”€ Phase 5:   Multi-LLM challenger β€” provider-agnostic disagreement analysis
 └── Phase 6:   Agent expansion tooling β€” CI/CD, community workflow

How a New Vulnerability Plugs In

This is the realized workflow, demonstrated by the path_traversal / CWE-22 pilot shipped in Move 0 PR2 (branch move0-pr2-path-traversal-pilot, 2026-05-26).

Step 1 β€” Research (per-vuln knowledge work): Author the agent YAML in domains/<cwe-1400-category>/<agent>.yaml following the same Tier 1-4 research methodology used for the original four agents. For the pilot, this produced domains/file-handling/path_traversal.yaml plus the new domains/file-handling/_domain.yaml carrying the CWE-1404 / OWASP A01 domain metadata that get_capabilities reports.

Step 2 β€” Register with benchmarks (one-line edit, deferred for the pilot): Add the agent's primary CWE (e.g., "CWE-22") to the ACTIVE_CWES frozenset in benchmarks/scripts/_active_cwes.py. Re-run the existing ingest scripts β€” CrossVul, MoreFixes, reality-check, etc. already contain CWE-22 data; it is just filtered out until the CWE is in the active set. For Move 0 PR2 this step is intentionally deferred to Move 1: the agent ships with a not-yet-benchmarked marker (same precedent as ADR-014 Rust deferral), and Move 1 closes the calibration. See docs/DECISIONS.md ADR-PILOT-BENCHMARK-DEFERRED.

Step 3 β€” MCP registration (automatic): The agent registry (Phase 1) discovers YAML files in domains/ and registers them. Catalog discovery via get_capabilities (Move 0 PR1) surfaces the new agent and (if newly populated) the new domain. No Python code changes β€” drop the YAML pair, restart the server. Move 0 PR2 proved this end-to-end: get_capabilities now lists path_traversal and the file-handling domain with zero Python changes, and list_agents / list_domains / Finding / findings-JSON shapes are unchanged.

Step 4 β€” Validation (unified zero-touch CLI, PR-D5 D5.7): Run uv run screw-agents onboard-agent --id <agent> from the repository root. This single command validates the YAML, checks ACTIVE_CWES membership, runs the four registry-driven ingest scripts (which auto-materialize upstream sources for new CWEs per PR-D3), executes the zero-touch smoke suite, and prints a per-language coverage report with meta.benchmark_status suggestions. See AGENT_ONBOARDING_RUNBOOK.md for the full 7-section procedure including coverage thresholds and common errors. The runner, metrics, dedup, splits, and report generator all work without modification β€” they are CWE-agnostic by design.

The autoresearch planner discovers the new agent automatically. Since Move 1 Task 3.5 (docs/specs/2026-05-26-move1-task3.5-registry-driven-planner-design.md), screw_agents.autoresearch.planner.build_run_plan enumerates (agent, dataset, primary_cwe) tuples from two sources running side-by-side: the legacy hardcoded G5_GATES list in benchmarks/runner/gate_checker.py (Phase-4 retrospective closure criteria for the original 4 agents) AND a new registry-driven path that reads AgentRegistry.agents.values() + ACTIVE_CWES + per-case cwe_ids[] from manifests. Dedupe rule: gate-derived wins on (agent, dataset) overlap (preserves Phase-4 semantics). For agents WITHOUT a G5 gate β€” path_traversal and every future catalog agent β€” the registry path supplies the case enumeration. No G5_GATES entry is required for Phase-6+ agents; their YAML's meta.cwes.primary plus the ACTIVE_CWES flip is sufficient. Selection strategies priority-stratified and expanded-stratified consume the registry-derived enumeration via pseudo-gates synthesized in controlled_run._registry_pseudo_gates. gate-order and required-dataset-smoke stay gate-only (Phase-4-pure).

The benchmark evaluator's case-to-agent routing is registry-driven too. Since Move 1 Task 3.6 (docs/specs/2026-05-26-move1-task3.6-registry-driven-evaluator-design.md), benchmarks/runner/evaluator.py::map_case_to_agent uses an lru_cache(maxsize=1)-decorated _build_cwe_to_agent_map() two-pass builder reading agent.meta.cwes.primary (Pass 1, authoritative β€” registry invariants ensure primaries don't collide between agents) and agent.meta.cwes.related (Pass 2, gap-fill via dict.setdefault). The old hardcoded _CWE_TO_AGENT dict that covered only the 4 original agents is deleted; Phase-4 routing behavior (CWE-94 β†’ ssti) is preserved because CWE-94 is in ssti.yaml::meta.cwes.related and no agent declares it as primary β€” Pass 2 fills the gap. Any future catalog agent auto-routes via its YAML's meta.cwes.primary. Same scale-proof fixture-based test pattern as the planner.

Zero-touch onboarding pipeline (Move 1.5 PREP, closed 2026-05-28)

The 8-stage per-agent lifecycle (research β†’ YAML β†’ benchmarks β†’ refinement β†’ status promotion) is documented end-to-end in AGENT_LIFECYCLE.md (conceptual) and AGENT_AUTHORING.md (developer procedure, stage by stage). The diagram and flow below describe the infrastructure that makes Stages 3–4 of that lifecycle zero-touch.

Move 1.5 PREP (docs/PHASE_6_MOVE_1_5_PREP_PLAN.md, PR-D1 through PR-D6) closed every remaining structural blocker so a new agent reaches a passing end-to-end scan via YAML edits + a one-line ACTIVE_CWES append β€” NO Python source edit, plugin markdown edit, or benchmark script edit required. The realized flow is:

  1. YAML drop. Agent author creates domains/<domain-id>/<agent-id>.yaml (and domains/<domain-id>/_domain.yaml if the domain is new). YAML declares meta.cwes.primary, supported languages on each HeuristicEntry, and optional declarative fields (meta.aliases, meta.benchmark.reviewer_flags, meta.benchmark.priority_thresholds, meta.benchmark.needs_related_context, plus the 4 catalog metadata fields supported_target_types / provider_mode_support / source_egress / ui_hints).
  2. ACTIVE_CWES append. Author adds the agent's primary CWE to the ACTIVE_CWES frozenset in benchmarks/scripts/_active_cwes.py β€” the single Python edit a new agent author MUST perform (this is the project-wide join point between agent content and benchmark infrastructure).
  3. onboard-agent unified runner. Author runs uv run screw-agents onboard-agent --id <agent-id> from the repository root. The CLI subcommand (PR-D5 D5.7) chains six steps: (a) YAML validation, (b) ACTIVE_CWES membership check, (c) the four registry-driven ingest scripts (ingest_ossf, ingest_reality_check_{python,java,csharp}) β€” which auto-materialize upstream sources for new CWEs (PR-D3), (d) the zero-touch smoke suite (tests/test_zero_touch_*_e2e.py), (e) per-language coverage report with meta.benchmark_status suggestions, (f) exit code propagation. See docs/AGENT_ONBOARDING_RUNBOOK.md for the full 7-section procedure.
  4. Registry auto-discovery. AgentRegistry._load() alphabetically walks domains/**/*.yaml, skips _*.yaml (reserved for domain metadata), and constructs an Agent per YAML. The new agent is immediately discoverable.
  5. Discovery aliases. meta.aliases surfaces in get_capabilities()["agents"][i]["aliases"] (PR-D4). Plugin-layer skills (Claude Code screw-review + Codex mirror) route user phrasing (e.g. "SQLi", "directory traversal", "zip-slip") against this list via mcp__screw-agents__list_agents β€” no hardcoded mapping tables.
  6. Catalog surface auto-expand. get_capabilities() returns the new agent in its agents[] array; list_agents / list_domains continue to report the new entry (with shape preserved). The catalog is registry-driven end-to-end.
  7. Plugin layer auto-expand. Claude Code skills + Codex skills render the updated agent list dynamically from mcp__screw-agents__list_agents. The universal screw-scan subagent handles all registered agents β€” no new subagent .md file required (T-SCAN-REFACTOR collapsed 5 per-vuln + per-domain subagents into one universal runner).
  8. No Python edit required. The only Python touch in the entire flow is the one-line ACTIVE_CWES append (step 2). All other behavior derives from YAML declarations + registry-driven helpers.

The acceptance gate (PR-D5 first-time-green; PR-D6 reconfirmed) is the 18/18 smoke-test assertions in tests/test_zero_touch_agent_add_e2e.py and tests/test_zero_touch_domain_add_e2e.py (9 pipeline steps Γ— 2 synthetic fixtures = 18 assertions). The gate runs in CI on every PR; if any step regresses to require manual intervention, CI fails loudly.

The full Move 1.5 PREP cumulative deliverable set is captured in docs/PROJECT_STATUS.md "Phase 6 / Move 1.5 PREP β€” CLOSED" and the SynApSec re-handover at docs/SYNAPSEC_HANDOVER.md (2026-05-28 PR-D6 entry).

The Join Point: _active_cwes.py

The central active-CWE registry (benchmarks/scripts/_active_cwes.py) is the single join point between one-time infrastructure and per-vuln content:

# benchmarks/scripts/_active_cwes.py
ACTIVE_CWES: frozenset[str] = frozenset({
    "CWE-78",    # OS Command Injection
    "CWE-79",    # Cross-Site Scripting
    "CWE-89",    # SQL Injection
    "CWE-94",    # Code Injection
    "CWE-1336",  # SSTI
    # Phase 6+ additions (Move 1 onward):
    "CWE-22",    # Path Traversal β€” YAML shipped Move 0 PR2; benchmark wired in Move 1 PR-B
    # "CWE-918", # SSRF β€” Move 2 candidate
    # ...
})

Every ingest script, the dedup pipeline, and the MoreFixes extractor import from this single module. Adding a CWE here unlocks it across the entire benchmark system. The Move 0 PR2 path_traversal pilot intentionally ships ahead of this step: agent YAML lands first (so the catalog and get_capabilities reflect it) and the matching CWE-22 line is added in Move 1 alongside focused calibration.


Tool & Subagent Inventory (post-T-SCAN-REFACTOR)

MCP tools (post-2026-04-25)

Scan tools:

  • scan_agents(agents, target, ...) β€” paginated multi-agent primitive. Cursor binding (target_hash, agents_hash). Returns init-page with agents_excluded_by_relevance + code-pages with per-agent prompts.
  • scan_domain(domain, target, ...) β€” convenience wrapper over scan_agents. Resolves all agents in a CWE-1400 domain.

Discovery tools:

  • list_agents(domain=None) β€” enumerate registered agents (optionally filtered by domain). Shape frozen (SynApSec compatibility guard).
  • list_domains() β€” enumerate domains. Shape frozen (SynApSec compatibility guard).
  • get_agent_prompt(agent_name) β€” fetch the per-agent core prompt on demand (lazy fetch from subagents).
  • get_capabilities() β€” versioned, machine-readable capability catalog (Move 0, 2026-05-25). No args. The single rich discovery source; additive to the frozen list_agents/list_domains. See "Capability Catalog" below.

Capability Catalog (Move 0): get_capabilities returns a versioned envelope (catalog_schema_version + artifact_schema_version + informational screw_agents_version) wrapping a global_capabilities block (target types, output formats, provider modes with source-egress/billing facts) and enriched domains[] / agents[] catalogs. Per-domain metadata (display name, description, CWE-1400 category, OWASP 2025) is sourced from domains/<domain>/_domain.yaml (the registry skips all _*.yaml from agent loading and, at serve time via create_server, refuses to start if a populated domain lacks _domain.yaml). This is assembled in src/screw_agents/capabilities.py. list_agents/list_domains stay byte-stable; get_capabilities is where richer catalog fields grow additively. New YAML agents/domains appear here with no client code change.

Accumulator tools (Phase 3a X1-M1 β€” paired with finalize_scan_results; called on every scan, not just adaptive flows):

  • accumulate_findings β€” appends finding records to the active session keyed by session_id. Phase 3a X1-M1 introduced this as the generic per-page consumer for the lazy-fetch pagination flow.

Adaptive tools (Phase 3b):

  • record_context_required_match, detect_coverage_gaps, lint_adaptive_script, stage_adaptive_script, promote_staged_script, reject_staged_script, execute_adaptive_script, verify_trust.

Slash-command parser:

  • resolve_scope(scope_text) β€” Task 8 helper; returns {agents, summary}. Used by /screw:scan to translate user input into an agent list. Closed allowlist (registry lookup) + no shell evaluation.

Output:

  • finalize_scan_results(session_id, formats=...) β€” emit JSON/Markdown/SARIF/CSV reports. Default format list as of T19-M D7: ["json", "markdown", "csv"].
  • record_exclusion, check_exclusions β€” exclusion learning surface (Phase 2).

Retired (T-SCAN-REFACTOR Task 6):

  • scan_full β€” replaced by scan_agents(agents=list_agents().names, ...) (or by the slash command's full keyword).
  • scan_<name> per-agent tools (sqli/cmdi/ssti/xss) β€” replaced by scan_agents(agents=[<name>], ...).

Subagents (post-2026-04-25)

  • screw-scan.md β€” universal scan runner (~559 LOC). Replaces 5 deleted per-vuln + per-domain subagents (screw-sqli, screw-cmdi, screw-ssti, screw-xss, screw-injection β€” Task 7 of T-SCAN-REFACTOR). Dispatched with agents: list[str] from main session.
  • screw-script-reviewer.md β€” adaptive script review. Dispatched by main session per pending_reviews chain (chain-subagents architecture, Phase 3b-C2).
  • screw-learning-analyst.md β€” learning-mode analyst (Phase 3a).

Subagents do NOT dispatch other subagents (Claude Code constraint, sub-agents.md:711). Main session is the sole orchestrator.

Slash command grammar (post-Task-8)

/screw:scan <scope-spec> <target> [--adaptive | --no-confirm | --thoroughness <L>] [--format <F>] [--primary-provider <provider> --primary-transport <transport> --primary-execution fixture|cli] [--parallel-providers provider:transport:execution,...] [--challenger <mode> --challenger-execution dry_run|cli]

--adaptive and --no-confirm are mutually exclusive (adaptive mode requires interactive consent).

Scope-spec forms (mutually exclusive):

  • Bare-token: single agent name (e.g., sqli) or domain name (e.g., injection-input-handling). Disambiguated via registry lookup; the agent name β‰  domain name invariant guarantees uniqueness.
  • full keyword: all registered agents (post-relevance-filter).
  • Prefix-key: domains:foo,bar agents:baz,qux β€” combine multiple domains and agents in one invocation.

Examples:

/screw:scan sqli src/api/                    # single agent
/screw:scan injection-input-handling src/    # whole domain
/screw:scan full .                           # all agents
/screw:scan agents:sqli,xss src/api/         # subset across domains
/screw:scan domains:foo agents:baz src/      # mix
/screw:scan domains:A,B agents:1A,2A,1B src/ # subset of A + subset of B
/screw:scan sqli src/api/ --challenger claude_primary_codex_challenger --challenger-execution dry_run
/screw:scan sqli src/api/ --primary-provider codex --primary-transport cli --primary-execution cli
/screw:scan sqli src/api/ --parallel-providers claude:cli:cli,codex:cli:cli

Scan flow (chain-subagents architecture)

slash command       resolve_scope        scan_agents (init page)
   ↓                    ↓                       ↓
main session ──────────────────────────────────→  pre-execution summary
   ↓                                                    ↓
   ↓                                              user consent (or --no-confirm)
   ↓                                                    ↓
dispatch screw-scan ──────────────────────────────────────────→ scan_agents (code pages)
   ↓                                                                  ↓
   ↓                                                            accumulate_findings
   ↓                                                                  ↓
parse return (C2 + enrichment) ←─────────────────────────────── return structured payload
   ↓
optionally chain screw-script-reviewer (per pending_reviews)
   ↓
finalize_scan_results
   ↓
report (JSON, Markdown, SARIF, CSV per --format)

Component Architecture

Agent Definitions (domains/)

YAML files carrying vulnerability-specific detection knowledge. Each agent includes:

  • meta: CWE IDs, CAPEC mappings, OWASP Top 10:2025 overlay, research sources
  • core_prompt: Distilled detection knowledge (2,000-4,000 tokens)
  • detection_heuristics: Language-specific patterns at high/medium/context-required severity
  • bypass_techniques: Real-world evasion patterns grounded in CVEs
  • remediation: Per-language fix guidance
  • few_shot_examples: Vulnerable + safe code pairs
  • target_strategy: Tree-sitter queries for function/class targeting

See docs/PRD.md Β§4 for the full schema.

MCP Server (src/screw_agents/)

Python MCP server exposing agent definitions as tools. Phase 1 builds:

  • Agent Registry: YAML loader β†’ Pydantic validation β†’ MCP tool registration
  • Target Resolver: tree-sitter AST parsing (10 languages) + glob file discovery + git diff parsing
  • Output Formatter: Findings β†’ JSON + SARIF + Markdown

Both stdio (Claude Code) and streamable HTTP (screw.nvim, CI/CD) transports.

Benchmark Evaluator (benchmarks/runner/)

CWE-1400-native Python evaluator (ADR-013). Components:

Module Responsibility
models.py Pydantic types: Finding, BenchmarkCase, MetricSet, Summary
sarif.py bentoo-SARIF read/write (SARIF 2.1.0 subset)
cwe.py CWE-1400 hierarchy traversal with strict/broad match modes
metrics.py Pair-based TPR/FPR/precision/recall/F1 computation
primevul.py Tree-sitter AST normalization, SHA-256 dedup, chrono/cross-project splits
report.py Markdown report rendering
cli.py python -m benchmarks.runner entry point

Ingest System (benchmarks/scripts/)

Reusable IngestBase abstract class with 8 dataset-specific subclasses. Each ingest script:

  1. Downloads/clones the dataset (ensure_downloaded())
  2. Parses the native format, filters to ACTIVE_CWES (extract_cases())
  3. Writes bentoo-SARIF truth files + provenance manifest (base class run())
Dataset Languages Ingest Script
OpenSSF CVE Benchmark JS/TS ingest_ossf.py
reality-check (C#/Python/Java) C#, Python, Java ingest_reality_check_*.py
go-sec-code-mutated Go ingest_go_sec_code.py
skf-labs-mutated Python ingest_skf_labs.py
CrossVul PHP, Ruby ingest_crossvul.py
Vul4J Java ingest_vul4j.py
MoreFixes All (via Postgres) morefixes_extract.py

Corpus Models: Per-CWE vs Monolithic

Source corpora come in two fundamentally different shapes, and the materialization path differs by shape (PR-D3 Task D3.3 PATH D).

Model Shape Materialization path
Per-CWE One upstream git repo per CVE β€” hundreds of separate clones, driven per-CWE. benchmarks/scripts/_materialization.py::materialize_for_cwe(cwe_id, "ossf") β€” idempotent clone driver with tracking file (benchmarks/external/.materialized.json). Per-clone commit pins come from each case's prePatch/postPatch SHAs in the CVE metadata.
Monolithic ONE upstream git repo for the entire corpus (markup files for every project). IngestBase.ensure_downloaded() β€” clones the single upstream repo once at the _PINNED_REF module constant declared in each ingest script. No per-CWE materialization is meaningful (the single repo already covers all CVEs/CWEs).

OSSF is per-CWE. reality-check (python/java/csharp) is monolithic and all three ingests share the same upstream repo (flawgarden/reality-check). The Corpus = Literal["ossf"] type alias in _materialization.py intentionally excludes the monolithic corpora β€” forcing them through per-CWE materialization would iterate a single-repo corpus once per CWE, which is pure waste.

Both paths pin upstream by 40-char hex SHA: per-CWE pins live in the CVE metadata; monolithic pins live in the _PINNED_REF module constant of each ingest script (benchmarks/tests/test_upstream_pins.py enforces SHA shape).

Assistant Command Integration (plugins/screw/)

Thin orchestration wrappers calling MCP tools. The current implementation is a shared assistant plugin directory with Claude Code metadata (.claude-plugin/plugin.json) and Codex metadata (.codex-plugin/plugin.json, .agents/plugins/marketplace.json). The slash-command names, agent roles, host skills, and MCP tool workflows define a portable assistant command contract. Future Gemini, local assistant, editor, or web-worker integrations should preserve the same command semantics and map host-specific UX onto the same backend tools.

  • Subagents (agents/): screw-scan.md (universal scan runner; T-SCAN-REFACTOR collapsed 5 per-vuln + per-domain subagents into this one), screw-script-reviewer.md (adaptive review chain), screw-learning-analyst.md (learning mode). Main session orchestrates dispatch (chain-subagents architecture).
  • Claude skills (skills/): Claude-native auto-invocation helpers for review and research workflows.
  • Codex skills (codex-skills/): Codex workflow skills. Codex-only scan/learn/adaptive skills live outside the top-level skills/ directory so Claude Code does not expose duplicate slash completions for command workflows.
  • Slash commands: User-facing entry points (for example /screw:scan, /screw:learn-report, and /screw:adaptive-cleanup; see "Tool & Subagent Inventory" above)
  • MCP config: repo-root .mcp.json for project-scoped Claude Code development, plus plugin-scoped plugins/screw/codex-mcp.json for repo-local Codex marketplace development.

Project-Level State (.screw/)

Per-project persistent state in the target repository:

  • findings/: Scan results
  • learning/exclusions.yaml: False positive patterns (Phase 2)
  • custom-scripts/: Adaptive analysis scripts (Phase 3)

Taxonomy

CWE-1400 (Comprehensive Categorization) is the structural backbone β€” 21 categories consolidated to 18 agent domains. Every finding carries a CWE ID as the universal join key.

OWASP Top 10:2025 is the risk communication overlay β€” used in reports and user-facing output, not as the domain structure.

See docs/PRD.md Β§9 for the full taxonomy mapping and docs/DECISIONS.md ADR-002/ADR-003 for the rationale.


Phase Plan Summary

Phase Type Focus Status
Phase 0 Per-vuln Knowledge Research Complete (4 agents)
Phase 0.5 One-time Benchmark Infrastructure Complete
Phase 1 One-time MCP Server + Registry + Resolver + Formatter Complete
Phase 2 Per-vuln Claude Code Integration (subagents, skills, FP learning) Complete
Phase 3 One-time Adaptive Analysis & Learning Complete (Phase 3a + Phase 3b + Phase 3b-C2)
Phase 4 One-time Autoresearch & Self-Improvement Complete
Phase 5 One-time Multi-LLM Challenger + Provider-Neutral Primary Scanning Complete β€” challenger/reconciliation/report surfaces, primary scan contracts, fixture validation, scan input assembly, backend CLI primary scanner runner plumbing, production Claude/Codex output normalization, package CLI, MCP surface, optional provider-scan finalization, backend composed primary-plus-challenger workflow, backend parallel primary reconciliation/finalization, mode-aware provider report filenames/metadata, universal /screw:scan provider-primary command contract, route-equivalent fixture validation, real Claude Code/Codex host-route fixture validation, one live Codex/Claude benchmark round trip, live composed validation in both Claude/Codex directions, and live parallel validation are implemented/recorded; additional provider adapters are accepted deferrals; see docs/PHASE_5_CLOSURE_READINESS.md
Phase 5.5 One-time Web application integration pilot Next β€” handoff in docs/PHASE_5_5_WEB_APP_INTEGRATION.md
Phase 6 Mixed Small-batch agent expansion Pending
Phase 7 One-time screw.nvim Integration Pending

See docs/PRD.md Β§12 for detailed phase descriptions and docs/PROJECT_STATUS.md for current state.


Key References

  • docs/PRD.md β€” Product Requirements Document (definitive)
  • docs/DECISIONS.md β€” Architecture Decision Records
  • docs/PHASE_0_5_PLAN.md β€” Phase 0.5 implementation plan (28 tasks)
  • docs/PHASE_0_5_VALIDATION_GATES.md β€” Phase 1.7 acceptance criteria
  • docs/PHASE_5_PLAN.md β€” Phase 5 challenger provider and transport plan
  • docs/PHASE_5_MANUAL_VALIDATION.md β€” Phase 5 manual round-trip validation evidence
  • docs/PROJECT_STATUS.md β€” Current state + deferred obligations
  • docs/KNOWLEDGE_SOURCES.md β€” Research targets for knowledge sprint
  • docs/AGENT_AUTHORING.md β€” Guide for writing new agent YAMLs