Skip to content

Latest commit

 

History

History
676 lines (543 loc) · 27.4 KB

File metadata and controls

676 lines (543 loc) · 27.4 KB

Phase 5 Manual Validation

Status: in progress. Fixture-mode provider-neutral primary scan validation is recorded. Live Codex and Claude CLI primary scan validation passed on one vulnerable/patched benchmark round trip; the Claude structured-output adapter behavior discovered during that run is now implemented in the production runner. Backend composed primary-plus-challenger workflow has fixture coverage and live Claude/Codex validation in both directions; backend parallel primary reconciliation has fixture coverage for agreed, unique, and severity-disputed findings plus live Claude/Codex validation on the MLflow SSTI vulnerable/patched pair. The universal /screw:scan provider-primary command contract is implemented. Real Claude Code and Codex host-route fixture validation passed for provider-primary, primary-plus-challenger, and parallel-provider paths. Last updated: 2026-05-06.

Scope

This document records manual round-trip validation for Phase 5 public provider surfaces. These checks simulate an end user running screw-agents from a separate project directory, not from repository internals.

Host-Route Fixture Validation

This strict host-route validation used the current plugin/skill/MCP surfaces from assistant sessions, not only backend helper scripts.

Environment:

  • Repository worktree: .worktrees/phase5-host-route-validation
  • Temporary end-user project: /tmp/screw-agents-phase5-host-routes
  • Target: /tmp/screw-agents-phase5-host-routes/src/app.py
  • Project config: /tmp/screw-agents-phase5-host-routes/.screw/config.yaml
  • Configured providers: claude and codex
  • Configured transports: fixture only
  • Live provider invocation: none
  • API keys required: none

The target came from the SSTI fixture benchmarks/fixtures/ssti/vulnerable/python_jinja2_from_string.py. Fixture execution intentionally returns empty provider findings; these checks validate host command routing, MCP payload shape, finalization, report naming, and coverage-gap propagation without sending source to live providers.

Claude Code Host Surface

Claude was launched from the temporary project with:

cd /tmp/screw-agents-phase5-host-routes
env -u ANTHROPIC_API_KEY claude \
  --plugin-dir /home/marco/Programming/AI/screw-agents/.worktrees/phase5-host-route-validation/plugins/screw \
  --mcp-config /tmp/screw-agents-phase5-host-routes/.mcp.json

Plugin/MCP discovery:

  • /mcp showed one connected screw-agents server with 28 tools.
  • /plugins showed:
    • commands: learn-report, scan, adaptive-cleanup;
    • agents: screw-scan, screw-script-reviewer, screw-learning-analyst;
    • skills: screw-review, screw-research.
  • /scr autocomplete showed the expected command/skill entries: /screw:scan, /screw:learn-report, /screw:adaptive-cleanup, /screw-review, and /screw-research.
  • The earlier duplicate entries /screw-scan, /screw-learn-report, and /screw-adaptive-cleanup were removed by moving Codex workflow skills under plugins/screw/codex-skills/.
  • The earlier failed duplicate plugin:screw:screw-agents MCP server was removed by moving the Codex plugin MCP descriptor to plugins/screw/codex-mcp.json.

Validated Claude commands:

/screw:scan ssti src/app.py --primary-provider codex --primary-transport fixture --primary-execution fixture --format json
  • Routed to run_provider_scan with finalize=true.
  • Wrote JSON: /tmp/screw-agents-phase5-host-routes/.screw/findings/ssti-codex-primary-2026-05-06T08-47-42.json
  • Summary: zero findings, clean trust status, one expected unresolved-sink coverage gap at src/app.py:20.
/screw:scan ssti src/app.py --primary-provider codex --primary-transport fixture --primary-execution fixture --challenger codex_primary_claude_challenger --challenger-execution dry_run --format json
  • Routed to run_composed_provider_scan.
  • Wrote JSON: /tmp/screw-agents-phase5-host-routes/.screw/findings/ssti-codex-primary-claude-challenger-2026-05-06T08-50-51.json
  • Summary: zero findings; challenger results empty because fixture primary produced no findings.
/screw:scan ssti src/app.py --parallel-providers claude:fixture:fixture,codex:fixture:fixture --format json
  • Routed to run_parallel_provider_scan with finalize=true.
  • Wrote JSON: /tmp/screw-agents-phase5-host-routes/.screw/findings/ssti-parallel-claude-codex-2026-05-06T08-52-08.json
  • Summary: zero findings from both fixture providers; reconciliation empty.

Codex Host Surface

Codex used the same temporary project and current worktree:

codex mcp add screw-agents -- \
  uv run --directory /home/marco/Programming/AI/screw-agents/.worktrees/phase5-host-route-validation \
  screw-agents serve --transport stdio

codex plugin marketplace add \
  /home/marco/Programming/AI/screw-agents/.worktrees/phase5-host-route-validation

codex -C /tmp/screw-agents-phase5-host-routes \
  --sandbox workspace-write \
  --ask-for-approval on-request

Codex MCP registration was run from a screw-agents checkout/worktree; running the same codex mcp add command from an unrelated directory produced a Codex configuration load error during validation.

Plugin/MCP discovery:

  • /plugins showed the installed screw-agents plugin with Codex skills: screw:screw-adaptive-cleanup, screw:screw-learn-report, screw:screw-research, screw:screw-review, and screw:screw-scan.
  • /mcp showed screw-agents connected to the worktree server.

Validated Codex skill prompts:

Use the screw:screw-scan skill to run: screw:scan ssti src/app.py --primary-provider codex --primary-transport fixture --primary-execution fixture --format json
  • Routed to run_provider_scan with finalize=true.
  • Wrote JSON: /tmp/screw-agents-phase5-host-routes/.screw/findings/ssti-codex-primary-2026-05-06T09-34-35.json
  • Summary: zero findings, clean trust status, one expected unresolved-sink coverage gap at src/app.py:20.
  • Regression found and fixed: the backend now normalizes relative provider scan targets such as src/app.py against project_root; after restart, Codex passed target: {"type":"file","path":"src/app.py"} and no absolute path retry was needed.
Use the screw:screw-scan skill to run: screw:scan ssti src/app.py --primary-provider codex --primary-transport fixture --primary-execution fixture --challenger codex_primary_claude_challenger --challenger-execution dry_run --format json
  • Routed to run_composed_provider_scan.
  • Wrote JSON: /tmp/screw-agents-phase5-host-routes/.screw/findings/ssti-codex-primary-claude-challenger-2026-05-06T09-35-51.json
  • Summary: zero findings; challenger results empty because fixture primary produced no findings.
Use the screw:screw-scan skill to run: screw:scan ssti src/app.py --parallel-providers claude:fixture:fixture,codex:fixture:fixture --format json
  • Routed to run_parallel_provider_scan with finalize=true.
  • Wrote JSON: /tmp/screw-agents-phase5-host-routes/.screw/findings/ssti-parallel-claude-codex-2026-05-06T09-36-50.json
  • Summary: zero findings from both fixture providers; reconciliation empty.

Conclusion: passed. Claude and Codex host routes both reached the provider primary, primary-plus-challenger, and parallel provider MCP workflows and wrote mode-aware reports from an end-user project without API keys or live provider execution.

Environment

  • Repository worktree: .worktrees/phase5-provider-scan-validation
  • Temporary end-user project: /tmp/screw-agents-phase5-provider-scan-fixture
  • Vulnerable target: /tmp/screw-agents-phase5-provider-scan-fixture/src/app.py
  • Project config: /tmp/screw-agents-phase5-provider-scan-fixture/.screw/config.yaml
  • Live provider invocation: none
  • API keys required: none

The temporary project contained a configured codex provider with:

  • enabled fixture transport;
  • enabled api transport used only to verify rejection behavior;
  • api_billing_allowed: false;
  • no configured CLI transport for this fixture-only validation.

/screw:scan Provider Route Fixture Validation

The universal /screw:scan provider-primary flags route to MCP provider scan tools after scope resolution. This validation exercised the same resolved scope, target, provider, transport, execution, session, finalization, and reconciliation arguments that the command contract now maps to those tools, without invoking live providers.

Environment:

  • Repository worktree: .worktrees/phase5-scan-ux-validation
  • Temporary end-user project: /tmp/screw-agents-phase5-scan-ux-fixture
  • Target: /tmp/screw-agents-phase5-scan-ux-fixture/src/app.py
  • Project config: /tmp/screw-agents-phase5-scan-ux-fixture/.screw/config.yaml
  • Configured providers: claude and codex
  • Configured transports: fixture only
  • Live provider invocation: none
  • API keys required: none

The project config enabled these fixture modes:

  • codex_primary_claude_challenger
  • claude_primary_codex_challenger

Command:

uv run python /tmp/screw-agents-phase5-scan-ux-fixture/validate_scan_routes.py

Result:

{
  "composed": {
    "active": 1,
    "challenger_count": 1,
    "mode_type": "primary_challenger",
    "primary_provider": "codex"
  },
  "parallel": {
    "mode_type": "parallel",
    "provider_count": 2,
    "reconciliation_statuses": ["disputed"]
  },
  "scope_agents": ["sqli"],
  "single_provider": {
    "active": 1,
    "provider": "codex"
  }
}

Validated route mappings:

  • /screw:scan sqli ... --primary-provider codex --primary-transport fixture --primary-execution fixture maps to run_provider_scan with finalize=true; result finalized one active finding and wrote JSON/Markdown reports.
  • /screw:scan sqli ... --primary-provider codex --primary-transport fixture --primary-execution fixture --challenger codex_primary_claude_challenger --challenger-execution dry_run maps to run_composed_provider_scan; result produced one primary finding, one challenger result, and one active finalized finding.
  • /screw:scan sqli ... --parallel-providers claude:fixture:fixture,codex:fixture:fixture maps to run_parallel_provider_scan; result ran two independent fixture primary scans and returned one severity-disputed reconciliation.

Conclusion: passed for route-equivalent fixture validation. This proves the new command contract can reach all three provider-primary MCP workflows without provider/API execution.

Live Benchmark Round Trip

The live provider-neutral primary scan validation used a separate temporary end-user project:

  • Temporary end-user project: /tmp/screw-agents-phase5-live-mlflow
  • Benchmark case: benchmarks/external/morefixes/morefixes-CVE-2023-6709-https_____github.com__mlflow__mlflow
  • Vulnerable target: /tmp/screw-agents-phase5-live-mlflow/vulnerable/__init__.py
  • Patched target: /tmp/screw-agents-phase5-live-mlflow/patched/__init__.py
  • Truth evidence: /tmp/screw-agents-phase5-live-mlflow/truth.sarif
  • Agent: ssti
  • CWE: CWE-1336

The benchmark truth identifies the vulnerable CardTab.to_html path where a caller-controlled Jinja2 template string is rendered with a non-sandboxed Environment(...).from_string(...); the patched target uses SandboxedEnvironment.

The temporary project configured:

  • codex CLI transport using codex exec --skip-git-repo-check --sandbox read-only --output-schema ... -;
  • claude CLI transport through a temporary wrapper that invokes claude -p --output-format json --json-schema ... and normalizes structured_output.findings into the provider-scan contract;
  • api_billing_allowed: false;
  • no API transport invocation.

The Codex primary runner strips OPENAI_API_KEY; the Claude primary runner and temporary wrapper strip ANTHROPIC_API_KEY. Both live paths therefore exercised subscription-backed CLI transports rather than an explicit API-key transport. Claude Code's JSON envelope reported usage/cost accounting, which is provider CLI metadata and should be documented separately from screw-agents API billing consent.

Codex CLI Primary Scan - Vulnerable Target

Command shape:

uv run screw-agents provider-scan \
  --project-root /tmp/screw-agents-phase5-live-mlflow \
  --provider codex \
  --transport cli \
  --execution cli \
  --agents ssti \
  --target-json '{"type":"file","path":"/tmp/screw-agents-phase5-live-mlflow/vulnerable/__init__.py"}' \
  --run-id codex-mlflow-live-004 \
  --session-id codex-mlflow-live-session-004 \
  --thoroughness standard \
  --timeout-seconds 300 \
  --finalize \
  --format json \
  --format markdown

Result:

  • Exit code: 0
  • Returned provider/transport: codex / cli
  • Returned finding count: 1
  • Finalized active finding count: 1
  • Severity: high
  • CWE: CWE-1336
  • Finding location: CardTab.to_html in the vulnerable MLflow file.

Conclusion: passed.

Codex CLI Primary Scan - Patched Target

Command shape matched the vulnerable run, with target /tmp/screw-agents-phase5-live-mlflow/patched/__init__.py and run id codex-mlflow-patched-live-001.

Result:

  • Exit code: 0
  • Returned provider/transport: codex / cli
  • Returned finding count: 0
  • Finalized active finding count: 0

Conclusion: passed.

Claude CLI Primary Scan - Vulnerable Target

The first direct Claude command returned a Claude JSON envelope whose result field was prose rather than the raw {"findings": [...]} object expected by the generic primary runner. A temporary adapter was used during this validation to extract structured_output.findings; the production Claude CLI primary runner now implements that same output-normalization behavior.

Command shape:

uv run screw-agents provider-scan \
  --project-root /tmp/screw-agents-phase5-live-mlflow \
  --provider claude \
  --transport cli \
  --execution cli \
  --agents ssti \
  --target-json '{"type":"file","path":"/tmp/screw-agents-phase5-live-mlflow/vulnerable/__init__.py"}' \
  --run-id claude-mlflow-live-004 \
  --session-id claude-mlflow-live-session-004 \
  --thoroughness standard \
  --timeout-seconds 360 \
  --finalize \
  --format json \
  --format markdown

Result:

  • Exit code: 0
  • Returned provider/transport: claude / cli
  • Returned finding count: 1
  • Finalized active finding count: 1
  • Severity: high
  • CWE: CWE-1336
  • Finding location: CardTab.to_html in the vulnerable MLflow file.

Conclusion: passed; this temporary validation adapter was later promoted into the production Claude CLI primary runner.

Claude CLI Primary Scan - Patched Target

Command shape matched the vulnerable run, with target /tmp/screw-agents-phase5-live-mlflow/patched/__init__.py and run id claude-mlflow-patched-live-001.

Result:

  • Exit code: 0
  • Returned provider/transport: claude / cli
  • Returned finding count: 0
  • Finalized active finding count: 0

Conclusion: passed; this temporary validation adapter was later promoted into the production Claude CLI primary runner.

Live Validation Lessons

  • Codex CLI can satisfy the primary scan contract through structured output when configured with a strict schema accepted by codex exec.
  • Claude CLI can produce the required structured finding payload, but provider adapters must read structured_output.findings from the Claude JSON envelope rather than expecting the top-level result field to be raw JSON. The production Claude CLI primary runner now does this.
  • provider-scan --finalize correctly accumulates and writes normal .screw/findings/ JSON/Markdown reports for live provider output.
  • The benchmark vulnerable/patched pair gives a useful acceptance shape: vulnerable finding count 1, patched finding count 0, same CWE/location signal as truth evidence.
  • The package CLI and MCP/backend surface are validated for primary provider execution. /screw:scan now exposes provider-neutral primary selection as the universal scan command contract that should be exposed consistently by Claude Code, Codex, Gemini, local assistants, or future plugin hosts. The new Codex skill route is validated for normal YAML/MCP scanning and provider-mode fixture routes.

Codex Plugin Skill Round Trip

The Codex plugin validation used a separate temporary project:

  • Temporary end-user project: /tmp/screw-agents-phase5-live-modes
  • Plugin marketplace: .worktrees/phase5-codex-command-discovery
  • Installed Codex plugin cache version: 0.1.4
  • MCP server: uv run --directory .worktrees/phase5-codex-command-discovery screw-agents serve --transport stdio
  • Agent: ssti

Codex v0.128.0 did not expose plugin commands/ files as literal /screw:* slash-completion entries during validation. It did load packaged skills and MCP tools. OpenAI Codex docs mark custom prompts as deprecated in favor of skills, so this validation exercised the Codex-supported skill path. Codex workflow skills are packaged under plugins/screw/codex-skills/ so Claude Code does not expose duplicate slash completions for scan, learning-report, or adaptive-cleanup workflows. Claude's historical screw-review and screw-research skills remain under plugins/screw/skills/.

Setup checks:

  • /plugins showed the installed screw-agents plugin with skills: screw:screw-adaptive-cleanup, screw:screw-learn-report, screw:screw-research, screw:screw-review, and screw:screw-scan.
  • /mcp showed screw-agents connected to the worktree server.
  • A no-scan registry check called only list_domains and list_agents, returning domain injection-input-handling and agents cmdi, sqli, ssti, and xss.

Dry explanation prompt:

Use the screw:screw-scan skill to explain how it would handle this request,
but do not call any MCP tools and do not run a scan: screw:scan ssti
/tmp/screw-agents-phase5-live-modes --format json

Result: passed. Codex read the screw:screw-scan skill and described the resolve_scope -> scan_agents -> accumulate_findings -> finalize_scan_results route without calling MCP tools.

Live skill prompt:

Use screw:screw-scan to run: screw:scan ssti
/tmp/screw-agents-phase5-live-modes --format json

Result:

  • Scope resolved to ["ssti"].
  • scan_agents paginated the target and completed.
  • Codex accumulated one high-confidence CWE-1336 finding for vulnerable/__init__.py:125.
  • finalize_scan_results wrote JSON: /tmp/screw-agents-phase5-live-modes/.screw/findings/ssti-2026-05-05T13-50-59.json
  • Final summary: 1 active finding, severity high, no suppressions, no exclusions, clean trust status, no coverage gaps.

Validation note: Codex attempted an unnecessary local uv run python -c "from screw_agents.models import Finding; ..." schema inspection from the temporary project and received ModuleNotFoundError: No module named 'screw_agents'. This did not affect MCP scan/finalization. The Codex scan skill now explicitly instructs Codex not to run shell/Python introspection for screw-agents schemas from the scanned project; MCP tool contracts are the authoritative interface.

Live Parallel Provider Validation

The live parallel validation used a separate temporary end-user project:

  • Temporary end-user project: /tmp/screw-agents-phase5-live-parallel
  • Benchmark case: benchmarks/external/morefixes/morefixes-CVE-2023-6709-https_____github.com__mlflow__mlflow
  • Vulnerable target: /tmp/screw-agents-phase5-live-parallel/vulnerable/__init__.py
  • Patched target: /tmp/screw-agents-phase5-live-parallel/patched/__init__.py
  • Configured providers: claude and codex
  • Configured transport: subscription-backed CLI for both providers
  • API keys required: none
  • ANTHROPIC_API_KEY: explicitly unset for the run

Command:

env -u ANTHROPIC_API_KEY \
  uv run python /tmp/screw-agents-phase5-live-parallel/run_parallel.py vulnerable

Result:

  • Claude primary scan returned one high-confidence CWE-1336 SSTI finding at vulnerable/__init__.py:109-115.
  • Codex primary scan returned one high-confidence CWE-1336 SSTI finding at /tmp/screw-agents-phase5-live-parallel/vulnerable/__init__.py:125-126.
  • Parallel reconciliation returned one agreed reconciliation containing both provider finding IDs: ssti-cardtab-from-string-001 and ssti-vulnerable-init-cardtab-template-125.
  • The live run validated that provider anchor differences are expected: Claude anchored on the class/source flow, while Codex anchored on the concrete Environment.from_string(self.template) sink. The backend now reconciles near-line findings for the same file/CWE as the same finding cluster.

Patched command:

env -u ANTHROPIC_API_KEY \
  uv run python /tmp/screw-agents-phase5-live-parallel/run_parallel.py patched

Patched result:

  • Claude primary scan returned zero findings.
  • Codex primary scan returned zero findings.
  • provider_findings was empty for both providers.
  • reconciliations was empty.

Conclusion: passed. Live parallel independent scans with reconciliation are validated for the MLflow MoreFixes SSTI vulnerable/patched pair.

Fixture Provider-Scan CLI Round Trip

Command:

uv run screw-agents provider-scan \
  --project-root /tmp/screw-agents-phase5-provider-scan-fixture \
  --provider codex \
  --transport fixture \
  --execution fixture \
  --agents sqli \
  --target-json '{"type":"file","path":"/tmp/screw-agents-phase5-provider-scan-fixture/src/app.py"}' \
  --run-id fixture-cli-001 \
  --session-id fixture-cli-session \
  --fixture-findings-json '[{"id":"sqli-manual-001","agent":"sqli","domain":"injection-input-handling","timestamp":"2026-05-04T12:00:00Z","location":{"file":"/tmp/screw-agents-phase5-provider-scan-fixture/src/app.py","line_start":5},"classification":{"cwe":"CWE-89","cwe_name":"SQL Injection","severity":"high","confidence":"high"},"analysis":{"description":"String interpolation is used to construct a SQL query."},"remediation":{"recommendation":"Use parameterized SQL queries."}}]'

Result:

  • Exit code: 0
  • Returned PrimaryScanResult.run_id: fixture-cli-001
  • Returned provider/transport: codex / fixture
  • Returned transport_kind: fixture
  • Returned finding id: sqli-manual-001
  • Finding was normalized through the shared Finding model, including default triage and optional fields.
  • Guardrails: {"fixture_runner": true}

Conclusion: passed.

Fixture MCP run_provider_scan Round Trip

Command:

uv run python -c 'import json; from screw_agents.engine import ScanEngine; from screw_agents.server import _dispatch_tool; engine=ScanEngine.from_defaults(); result=_dispatch_tool(engine,"run_provider_scan",{"project_root":"/tmp/screw-agents-phase5-provider-scan-fixture","provider":"codex","transport":"fixture","execution":"fixture","run_id":"fixture-mcp-001","session_id":"fixture-mcp-session","agents":["sqli"],"target":{"type":"file","path":"/tmp/screw-agents-phase5-provider-scan-fixture/src/app.py"},"fixture_findings":[{"id":"sqli-mcp-001","agent":"sqli","domain":"injection-input-handling","timestamp":"2026-05-04T12:05:00Z","location":{"file":"/tmp/screw-agents-phase5-provider-scan-fixture/src/app.py","line_start":5},"classification":{"cwe":"CWE-89","cwe_name":"SQL Injection","severity":"high","confidence":"high"},"analysis":{"description":"String interpolation is used to construct a SQL query."},"remediation":{"recommendation":"Use parameterized SQL queries."}}]}); print(json.dumps({"run_id":result["run_id"],"provider":result["provider"],"transport_kind":result["transport_kind"],"finding_id":result["findings"][0]["id"],"guardrails":result["guardrails"]}, sort_keys=True))'

Result:

{"finding_id": "sqli-mcp-001", "guardrails": {"fixture_runner": true}, "provider": "codex", "run_id": "fixture-mcp-001", "transport_kind": "fixture"}

Conclusion: passed.

API/Local Rejection Guardrail

Command:

uv run screw-agents provider-scan \
  --project-root /tmp/screw-agents-phase5-provider-scan-fixture \
  --provider codex \
  --transport api \
  --execution cli \
  --agents sqli \
  --target-json '{"type":"file","path":"/tmp/screw-agents-phase5-provider-scan-fixture/src/app.py"}'

Result:

screw-agents provider-scan: execution 'cli' requires a 'cli' transport; 'codex'/'api' is 'api'

Conclusion: passed. The public surface rejected the API transport before any provider invocation.

Current Validation Matrix

Scenario Status Notes
Package CLI provider-scan fixture execution Passed Validated from /tmp project
MCP run_provider_scan fixture execution Passed Validated through dispatcher
API transport rejection Passed No provider invocation
Local transport rejection Pending Requires local transport config fixture
Codex CLI primary scan live run Passed MLflow MoreFixes vulnerable/patched SSTI case
Claude CLI primary scan live run Passed MLflow MoreFixes vulnerable/patched SSTI case; production runner now extracts the validated structured_output.findings shape
Provider scan result accumulation/finalization Passed Fixture, Codex live, and Claude live outputs wrote .screw/findings/ reports
Mode-aware provider report naming/metadata Passed Provider-primary, primary-plus-challenger, and parallel reports distinguish scan mode/provider in filenames and JSON/Markdown/SARIF metadata
Primary plus challenger public round trip Passed Fixture route passed; live Codex-primary/Claude-challenger and Claude-primary/Codex-challenger validation passed on the MLflow MoreFixes SSTI vulnerable/patched pair
Parallel independent primary scans Passed Fixture route passed; live Claude/Codex parallel validation reconciled the vulnerable SSTI as agreed and returned zero findings on patched
Codex plugin YAML/MCP scan skill Passed screw:screw-scan routed command-shaped input through MCP scan/finalize tools and wrote JSON
Claude /screw:scan provider-neutral primary UX Passed Manual Claude Code host route reached provider-primary, composed, and parallel MCP workflows and wrote mode-aware reports
Codex screw:screw-scan provider-neutral skill UX Passed Manual Codex host route reached provider-primary, composed, and parallel MCP workflows and wrote mode-aware reports

Decision

Fixture-mode provider-neutral primary scan execution is validated for the new public package CLI and MCP surfaces. Live Codex and Claude CLI primary scanning is validated on one real benchmark vulnerable/patched pair, including report finalization. Finalized provider-primary, primary-plus-challenger, and parallel reports now carry mode/provider labels in filenames and JSON/Markdown/SARIF metadata. Backend composed primary plus challenger flow is covered for both Codex-primary/Claude-challenger and Claude-primary/Codex-challenger fixture directions and live CLI directions. In the live vulnerable runs, the primary provider reported one high-confidence SSTI finding and the configured challenger agreed; in the patched runs, both primary providers returned zero findings and no challenger review was invoked. Backend parallel independent scan reconciliation is covered for agreed, unique, and severity-disputed fixture findings, and live parallel validation passed on the MLflow MoreFixes SSTI vulnerable/patched pair. The universal /screw:scan provider-primary command contract is implemented, and route-equivalent fixture validation passed for single provider-primary, primary-plus-challenger, and parallel-provider paths. Real Claude Code and Codex host-route fixture validation also passed for single provider-primary, primary-plus-challenger, and parallel-provider paths. Codex plugin skill validation passed for the normal YAML/MCP scan route and provider-mode routes. Additional provider adapters remain deferred closure decisions; see docs/PHASE_5_CLOSURE_READINESS.md.