Lightweight Node.js capture proxy for analyzing LLM API request payloads. Understand what's eating your context window.
Fork integration note:
llm_inspector/is a tracked copy of jleechanorg/llm_inspector (public). It is used by this AO fork to measure and reduce context overhead. Do not editllm_inspector/directly — submit changes upstream at the source repo, then pull updates here.
| Feature | Status | Savings |
|---|---|---|
Capture mode (start) |
✅ Stable | Visibility |
--tool-mode lean |
✅ Stable | ~20K tokens/turn |
--tool-mode on-demand (stub + re-issue) |
✅ Stable | ~84.9% upfront on heavy tools |
See full evidence: docs/evidence/on-demand-stub-schema-2026-04-11/ (N=10 run, mean 84.9% Agent schema reduction, PASS).
Baseline from a real Claude Code session (claude --print "What is 2+2?", haiku model):
| Component | Bytes | ~Tokens | % |
|---|---|---|---|
| Built-in tool definitions | 91,932 | ~26,266 | 49% |
| System prompt | 28,113 | ~8,032 | 15% |
| CLAUDE.md stack (3 levels) | 30,010 | ~8,574 | 16% |
| MCP tool definitions | 27,694 | ~7,913 | 15% |
| Skills list | 7,164 | ~2,047 | 4% |
| Total overhead | 184,913 | ~52,832 | 100% |
At ~53K tokens/turn, a 200K context window fills in ~3 turns without compaction.
Claude Code
│ ANTHROPIC_BASE_URL=http://localhost:9000
│ ANTHROPIC_API_KEY=oauth-proxy
▼
llm-inspector :9000 ← captures full JSON request payloads to disk
│ forwards to http://127.0.0.1:8000/claude
▼
ccproxy :8000 ← handles OAuth token refresh → Anthropic API
▼
Anthropic API
llm-inspector start starts ccproxy automatically if it isn't already running.
Requirements: Node.js 18+, Python 3.9+
curl -fsSL https://raw.githubusercontent.com/jleechanorg/llm_inspector/main/install.sh | bash# 1. Install ccproxy-api (Python OAuth proxy)
uv tool install ccproxy-api # or: pip install ccproxy-api
# 2. Authenticate ccproxy with Claude OAuth
ccproxy auth refresh claude-api
# 3. Install llm-inspector
npm install -g llm-inspector# Start capture chain (starts ccproxy + capture proxy)
llm-inspector start
# Route Claude Code through it
export ANTHROPIC_BASE_URL=http://localhost:9000
export ANTHROPIC_API_KEY=oauth-proxy
# Make a request
claude --print "What is 2+2?"
# See what was captured
llm-inspector analyzeCaptures all requests and responses for analysis. No modifications.
Strips 17 heavy built-in tool schemas from every request at the proxy layer:
llm-inspector start --tool-mode lean
# or
LLM_INSPECTOR_TOOL_MODE=lean llm-inspector startStripped tools: Agent, TeamCreate, TeamDelete, TaskCreate, TaskUpdate, TaskGet, TaskList, TaskOutput, TaskStop, SendMessage, CronCreate, CronDelete, CronList, EnterWorktree, ExitWorktree, Skill, RemoteTrigger (~20K tokens/turn savings for lean sessions).
Kept: Bash, Read, Write, Edit, MultiEdit, Glob, Grep, WebFetch, WebSearch, AskUserQuestion, EnterPlanMode, ExitPlanMode, NotebookEdit + all MCP tools.
Replaces heavy tool schemas with ~206-byte stubs before forwarding. On first heavy-tool use by the model, re-issues with the real schema. Heavy tools work, but cost is deferred to first use.
llm-inspector start --tool-mode on-demandStub example (Agent tool):
- Original: 1,368 bytes → Stub: 206 bytes (84.9% reduction)
- Stub format: Claude Messages API
input_schemaat top level (must have at least 1 property) - Re-issue latency: ~200–400ms on first heavy-tool call
Evidence: docs/evidence/on-demand-stub-schema-2026-04-11/ — 10-run real integration test, mean 84.9% reduction, PASS.
| Command | Description |
|---|---|
llm-inspector start |
Start capture chain on port 9000 |
llm-inspector start --port 9199 |
Custom port |
llm-inspector start --upstream <url> |
Forward directly to a URL (skip ccproxy) |
llm-inspector start --foreground |
Run in foreground (no daemon) |
llm-inspector start --tool-mode lean|on-demand |
Set tool mode |
llm-inspector stop |
Stop capture proxy |
llm-inspector status |
Check if running, show capture count |
llm-inspector analyze |
Show token breakdown for all captures |
llm-inspector analyze --last 5 |
Analyze last 5 captures |
llm-inspector analyze --sort tokens |
Sort by estimated token count |
llm-inspector analyze --json |
Output as JSON |
llm-inspector clean |
Remove all captured request files |
ccproxy handles OAuth with the Anthropic API. After installing:
# Authenticate (opens browser for OAuth flow)
ccproxy auth login
# Or refresh an existing token
ccproxy auth refresh claude-apiConfig lives at ~/.ccproxy/config.yaml. The default model entry should have api_key: claude-api to use OAuth.
Source: github.com/jleechanorg/llm_inspector (public) Evidence bundle: llm_inspector/docs/evidence/ in this repo Design doc: roadmap/on-demand-tool-profiles.md
Changes to llm_inspector/ in this fork should be submitted as PRs to the source repo first.