usestrix · ms6rb · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/README.md b/README.md
@@ -175,6 +175,16 @@ strix --target api.your-app.com --instruction "Focus on business logic flaws and
 strix --target api.your-app.com --instruction-file ./instruction.md
 ```
 
+### MCP Server (AI Agent Integration)
+
+Use Strix as an MCP server to integrate with AI coding agents like Claude Code, Cursor, and Windsurf:
+
+```bash
+pip install strix-mcp
+```
+
+See [`strix-mcp/README.md`](strix-mcp/README.md) for setup instructions and the full tool coverage map.
+
 ### Headless Mode
 
 Run Strix programmatically without interactive UI using the `-n/--non-interactive` flag—perfect for servers and automated jobs. The CLI prints real-time vulnerability findings, and the final report before exiting. Exits with non-zero code when vulnerabilities are found.

diff --git a/docs/superpowers/plans/2026-03-17-recon-phase.md b/docs/superpowers/plans/2026-03-17-recon-phase.md
diff --git a/docs/superpowers/specs/2026-03-17-recon-phase-design.md b/docs/superpowers/specs/2026-03-17-recon-phase-design.md
diff --git a/strix-mcp/.gitignore b/strix-mcp/.gitignore
@@ -0,0 +1,2 @@
+.mcp.json
+docs/
diff --git a/strix-mcp/E2E_CHECKLIST.md b/strix-mcp/E2E_CHECKLIST.md
@@ -0,0 +1,89 @@
+# MCP E2E Verification Checklist
+
+Manual verification steps for testing strix-mcp across MCP clients.
+
+## Prerequisites
+
+- [ ] Docker running
+- [ ] Sandbox image pulled: `docker pull ghcr.io/usestrix/strix-sandbox:0.1.12`
+- [ ] strix-mcp installed: `cd strix-mcp && pip install -e .`
+
+## Claude Code
+
+Config in `.mcp.json` or `~/.claude/mcp_servers.json`:
+```json
+{
+  "mcpServers": {
+    "strix": {
+      "command": "strix-mcp"
+    }
+  }
+}
+```
+
+- [ ] Server starts without errors
+- [ ] `start_scan` with web target launches sandbox
+- [ ] `terminal_execute` runs commands (e.g. `whoami` returns `pentester`)
+- [ ] `browser_action` with `launch` then `goto` returns screenshots
+- [ ] `send_request` sends HTTP through proxy and returns response
+- [ ] `list_requests` shows captured proxy traffic
+- [ ] `str_replace_editor` with `create` creates files in sandbox
+- [ ] `str_replace_editor` with `view` reads files from sandbox
+- [ ] `str_replace_editor` with `str_replace` edits files in sandbox
+- [ ] `create_note` creates a note and returns note_id
+- [ ] `list_notes` shows created notes with category filtering
+- [ ] `update_note` modifies note content
+- [ ] `delete_note` removes a note
+- [ ] `create_vulnerability_report` stores finding and returns report_id
+- [ ] `list_vulnerability_reports` shows filed reports
+- [ ] `get_finding` returns full markdown detail from disk
+- [ ] `dispatch_agent` returns agent_id + ready-to-use prompt
+- [ ] `suggest_chains` returns chain opportunities (after 2+ findings)
+- [ ] `get_scan_status` shows elapsed time, agents, and severity counts
+- [ ] `get_module` loads a security knowledge module (e.g. "sql_injection")
+- [ ] `list_modules` returns module catalog with categories
+- [ ] `end_scan` returns summary with OWASP grouping and severity counts
+- [ ] `strix_runs/` directory created with `vulnerabilities/*.md`, `vulnerabilities.csv`, and `summary.md`
+
+## Cursor
+
+Config in `.cursor/mcp.json`:
+```json
+{
+  "mcpServers": {
+    "strix": {
+      "command": "strix-mcp"
+    }
+  }
+}
+```
+
+- [ ] Server starts without errors
+- [ ] `start_scan` launches sandbox
+- [ ] Basic tool execution works (terminal, HTTP, files)
+- [ ] `create_vulnerability_report` and `list_vulnerability_reports` work
+- [ ] `end_scan` completes cleanly
+
+## Windsurf
+
+Config in `~/.codeium/windsurf/mcp_config.json`:
+```json
+{
+  "mcpServers": {
+    "strix": {
+      "command": "strix-mcp"
+    }
+  }
+}
+```
+
+- [ ] Server starts without errors
+- [ ] `start_scan` launches sandbox
+- [ ] Basic tool execution works (terminal, HTTP, files)
+- [ ] `end_scan` completes cleanly
+
+## Post-Verification
+
+- [ ] Run `docker ps` — no orphaned strix containers remain after `end_scan`
+- [ ] Second scan starts cleanly after first ends
+- [ ] `strix_runs/` contains expected files from the scan
diff --git a/strix-mcp/README.md b/strix-mcp/README.md
@@ -0,0 +1,179 @@
+# Strix MCP Server
+
+MCP (Model Context Protocol) server that exposes Strix's Docker security sandbox to AI coding agents. Works with any MCP-compatible client — Claude Code, Cursor, Windsurf, Cline, and others.
+
+## Prerequisites
+
+- Docker (running)
+- Python 3.12+
+
+## Installation
+
+```bash
+pip install strix-mcp
+```
+
+Pull the Docker image before your first scan:
+
+```bash
+docker pull ghcr.io/usestrix/strix-sandbox:0.1.12
+```
+
+## Client Configuration
+
+### Claude Code
+
+Add to your project's `.mcp.json` or `~/.claude/mcp_servers.json`:
+
+```json
+{
+  "mcpServers": {
+    "strix": {
+      "command": "strix-mcp",
+      "args": []
+    }
+  }
+}
+```
+
+### Cursor
+
+Add to `.cursor/mcp.json`:
+
+```json
+{
+  "mcpServers": {
+    "strix": {
+      "command": "strix-mcp",
+      "args": []
+    }
+  }
+}
+```
+
+### Windsurf
+
+Add to `~/.codeium/windsurf/mcp_config.json`:
+
+```json
+{
+  "mcpServers": {
+    "strix": {
+      "command": "strix-mcp",
+      "args": []
+    }
+  }
+}
+```
+
+### Other MCP Clients
+
+Any client that supports MCP stdio transport can use strix-mcp. Point it at the `strix-mcp` command with no arguments.
+
+## Quick Start
+
+Ask your AI agent:
+
+> "Start a security scan on ./my-app and test for OWASP Top 10 vulnerabilities"
+
+The agent will boot a Kali Linux sandbox, copy your code, and begin testing.
+
+## Workflow
+
+1. `start_scan` — boot sandbox, detect tech stack, get recommended scan plan
+2. `dispatch_agent` — for each testing area, register a subagent and get a ready-to-use prompt
+3. Pass each prompt to your AI agent's sub-agent/tool system — agents test in parallel with isolated sessions
+4. Agents file findings with `create_vulnerability_report` (auto-dedup, auto-chain detection)
+5. `suggest_chains` — review chaining opportunities, dispatch follow-up agents
+6. `end_scan` — tear down sandbox, get deduplicated OWASP-categorized summary
+
+## Strix Feature Coverage
+
+This MCP server exposes Strix's sandbox tools to external AI agents. Below is the coverage map against the full Strix tool suite.
+
+### Proxied Tools
+
+These tools are forwarded directly to the Strix sandbox container — same behavior as native Strix.
+
+| Tool | Description | Parity |
+|------|-------------|--------|
+| `terminal_execute` | Execute commands in persistent Kali Linux terminal | Full |
+| `send_request` | Send HTTP requests through Caido proxy | Full |
+| `repeat_request` | Replay captured requests with modifications | Full |
+| `list_requests` | Filter proxy traffic with HTTPQL | Full |
+| `view_request` | Inspect request/response details | Full |
+| `browser_action` | Control Playwright browser (returns screenshots) | Full |
+| `python_action` | Run Python in persistent interpreter sessions | Full |
+| `list_files` | List sandbox workspace files | Full |
+| `search_files` | Search file contents by pattern | Full |
+| `str_replace_editor` | Edit files in sandbox | Full |
+| `scope_rules` | Manage proxy scope filtering | Full |
+| `list_sitemap` | View discovered attack surface | Full |
+| `view_sitemap_entry` | Inspect sitemap entry details | Full |
+
+### MCP Orchestration Layer
+
+Tools implemented by the MCP server for AI agent coordination — not proxied from the Strix sandbox.
+
+| Tool | Description |
+|------|-------------|
+| `start_scan` | Boot sandbox, detect tech stack, generate scan plan |
+| `end_scan` | Tear down sandbox, deduplicate findings, OWASP summary |
+| `create_vulnerability_report` | File findings with auto-dedup, chain detection, and disk persistence (simplified interface vs native) |
+| `dispatch_agent` | Register subagent and compose ready-to-use prompt |
+| `get_scan_status` | Monitor scan progress and pending chains |
+| `list_vulnerability_reports` | List filed reports (summaries, deduplication check) |
+| `get_finding` | Read full finding details from disk |
+| `get_module` | Load security knowledge module |
+| `list_modules` | List available knowledge modules |
+| `suggest_chains` | Review vulnerability chaining opportunities |
+| `create_note` | Create structured notes during scans |
+| `list_notes` | List and filter scan notes |
+| `update_note` | Update existing notes |
+| `delete_note` | Delete notes |
+
+### Not Yet Supported
+
+These Strix tools are not yet available through the MCP server.
+
+| Tool | Category | Notes |
+|------|----------|-------|
+| `create_todo` / `list_todos` / `update_todo` / `mark_todo_done` / `mark_todo_pending` / `delete_todo` | Todos | Task tracking within scans |
+| `finish_scan` | Completion | Native scan finalization with executive summary, methodology, and recommendations |
+| `create_vulnerability_report` (native) | Reporting | Full CVSS XML breakdown, CWE/CVE, code locations, PoC scripts (MCP uses simplified interface) |
+| `view_agent_graph` / `create_agent` / `send_message_to_agent` / `agent_finish` / `wait_for_message` | Agent Graph | Native multi-agent orchestration (MCP uses `dispatch_agent` instead) |
+
+> **Note:** `think` and `web_search` are intentionally not proxied — agents should use their native reasoning and web search capabilities instead. See the methodology resource for details.
+
+### Resources
+
+| URI | Description |
+|-----|-------------|
+| `strix://methodology` | Penetration testing playbook and orchestration guide |
+| `strix://modules` | List of available security knowledge modules |
+| `strix://modules/{name}` | Specific module content (e.g. `strix://modules/sql_injection`) |
+
+## Architecture
+
+The MCP server acts as a bridge between AI agents and a Strix Docker sandbox:
+
+```
+AI Agent (Claude Code, Cursor, etc.)
+    ↕ MCP (stdio)
+strix-mcp server
+    ↕ HTTP
+Strix Docker Container (Kali Linux)
+    ├── Caido proxy
+    ├── Playwright browser
+    ├── Terminal sessions
+    ├── Python interpreter
+    └── Security tools (nuclei, sqlmap, ffuf, etc.)
+```
+
+All agents share one container but get isolated sessions (terminal, browser, Python) via `agent_id`.
+
+## Known Limitations
+
+- One scan at a time per MCP server instance
+- Requires Docker image pull before first scan (see Installation)
+- Agent graph tools not supported — MCP uses its own orchestration via `dispatch_agent`
diff --git a/strix-mcp/docs/plans/2026-03-14-telemetry-integration-design.md b/strix-mcp/docs/plans/2026-03-14-telemetry-integration-design.md
@@ -0,0 +1,85 @@
+# Telemetry Integration Design
+
+> Integrate upstream `strix.telemetry.tracer.Tracer` into the MCP server as the single source of truth for findings, agent lifecycle, and tool execution events.
+
+## Decision: Use Upstream Tracer Directly
+
+The upstream strix project uses a global singleton pattern:
+- Entrypoint creates `Tracer(run_name)` and calls `set_global_tracer()`
+- All code accesses it via `get_global_tracer()`
+- The Tracer stores findings, writes per-vuln markdown/CSV, emits JSONL events, and manages OTEL spans
+
+The MCP will follow this pattern exactly. The MCP's `start_scan` is the equivalent of the CLI/TUI entrypoint.
+
+## Tracer Lifecycle
+
+**`start_scan`:**
+- Create `Tracer(run_name=scan_id)`, call `set_global_tracer(tracer)`
+- Call `tracer.set_scan_config({"targets": targets, ...})`
+- Guard with try/except — if Tracer init fails, continue without telemetry
+
+**`end_scan`:**
+- Call `tracer.save_run_data(mark_complete=True)` — writes all output files
+- Call `set_global_tracer(None)` to clear for next scan
+- Clear `fired_chains` and `notes_storage` (MCP-only state)
+
+## Vulnerability Reports Migration
+
+Replace MCP's in-memory `vulnerability_reports` list with `tracer.vulnerability_reports`.
+
+**`create_vulnerability_report`:**
+- MCP keeps title-normalization dedup as pre-check via `tracer.get_existing_vulnerabilities()`
+- New findings stored via `tracer.add_vulnerability_report()` — Tracer handles markdown output, JSONL events, posthog
+- Merge logic (upgrade severity, append evidence) mutates `tracer.vulnerability_reports` entries directly
+- Chain detection reads from `tracer.get_existing_vulnerabilities()`
+
+**`list_vulnerability_reports`:** reads from `tracer.get_existing_vulnerabilities()`.
+
+**`get_finding`:** reads from `tracer.get_run_dir() / "vulnerabilities" / f"{id}.md"`.
+
+## Agent & Tool Event Logging
+
+**`dispatch_agent`:** after `sandbox.register_agent()`, call `tracer.log_agent_creation(agent_id, name, task, parent_id)`.
+
+**Proxy tool logging:** add tracer calls inside `SandboxManager.proxy_tool()` — one integration point covers all 20+ proxied tools:
+- Before: `tracer.log_tool_execution_start(agent_id, tool_name, args)` → returns `execution_id`
+- After: `tracer.update_tool_execution(execution_id, status, result)`
+
+**`get_scan_status`:** enrich with `tracer.agents` and `tracer.get_real_tool_count()`.
+
+## What Gets Removed
+
+**Functions deleted from `tools.py`:**
+- `_write_finding_md()` — Tracer's `save_run_data()` writes per-vuln markdown
+- `_write_vuln_csv()` — Tracer writes `vulnerabilities.csv`
+- `_write_summary_md()` — Tracer writes `penetration_test_report.md`
+- `_get_run_dir()` — use `tracer.get_run_dir()` instead
+
+**Closure variables removed:**
+- `vulnerability_reports: list` → `tracer.vulnerability_reports`
+- `scan_dir: Path | None` → `tracer.get_run_dir()`
+
+**Closure variables kept:**
+- `fired_chains: set[str]` — MCP-only
+- `notes_storage: dict` — MCP-only
+
+**Kept but modified:**
+- `_normalize_title()`, `_find_duplicate()`, `_deduplicate_reports()` — MCP's title-based dedup
+- `_categorize_owasp()`, `_OWASP_KEYWORDS` — used in `end_scan` summary
+- `_normalize_severity()`, `_SEVERITY_ORDER` — dedup merge logic
+
+## Error Handling
+
+- Every tracer call guarded with `if tracer:` + try/except
+- Tracer init failure in `start_scan` logs warning, scan continues without telemetry
+- Proxy tool logging failures don't block tool execution
+- Upstream `STRIX_TELEMETRY=0` disables JSONL/OTEL but Tracer still works as in-memory store
+
+## No New Dependencies
+
+`opentelemetry`, `scrubadub` already available transitively via `strix-agent` dependency.
+
+## Testing
+
+- Existing unit tests: mock `get_global_tracer()` returning `None` — behavior unchanged
+- New tests: verify tracer integration (agent logging, tool logging, finding storage, file output)