-
Notifications
You must be signed in to change notification settings - Fork 329
Description
Context
rtk discover scans Claude Code sessions locally. This issue tracks adding the ability for users to share their discover results back to the RTK project, so we can prioritize filter development based on real-world usage data rather than guessing.
Scope: 4 Rust phases + Cloudflare Worker backend, shipped together.
Phase 1: Anonymization structs + logic
File: src/discover/report.rs
Add new structs for sharing (no args, no paths, no examples):
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct AnonymizedEntry {
pub command: String, // "curl", "terraform", "helm"
pub count: usize,
}
#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct AnonymizedMissedEntry {
pub command: String, // "git status", "cargo test"
pub count: usize,
pub rtk_equivalent: String, // "rtk git", "rtk cargo"
}
#[derive(Debug, Serialize, Deserialize)]
pub struct AnonymizedReport {
pub sessions_scanned: usize,
pub total_commands: usize,
pub already_rtk: usize,
pub unhandled: Vec<AnonymizedEntry>,
pub missed: Vec<AnonymizedMissedEntry>,
}Add anonymize_report(report: &DiscoverReport) -> AnonymizedReport:
unsupported→unhandled: keepbase_command(first word only), dropexamplesupported→missed: keep first 1-2 words ofcommand, drop token estimates- Reuse existing
truncate_command()frommod.rs(make itpub(crate))
Add format_github_issue(report: &AnonymizedReport, version, os, arch) -> String.
Phase 2: rtk discover --issue
Files: src/main.rs, src/discover/mod.rs
Add --issue flag to Commands::Discover. When set: anonymize → render markdown → print to stdout → exit.
Output format:
## RTK Discover Report
**Version**: 0.28.0 | **OS**: macos/aarch64 | **Sessions**: 23 | **Commands**: 412
### Top Unhandled Commands (no RTK filter yet)
| Command | Count |
|---------|-------|
| curl | 45 |
| terraform | 32 |
### Missed Savings (RTK already supports these)
| Command | Count | RTK Equivalent |
|---------|-------|----------------|
| git status | 67 | rtk git |
| cargo test | 23 | rtk cargo |
---
*Generated by `rtk discover --issue` v0.28.0*Users can then:
rtk discover --issue | pbcopy # paste manually
rtk discover --issue | gh issue create --title "Discover report" --body-file -Phase 3: rtk discover --share
New file: src/discover/share.rs
Files: src/main.rs, src/discover/mod.rs, src/telemetry.rs
Add --share flag. Flow:
- Run discover pipeline
- Anonymize report
- Show preview of exactly what will be sent
- Ask
[y/N]confirmation - POST to
RTK_SHARE_URL(compile-time env, same pattern as telemetry)
Preview shows:
RTK Discover -- Sharing Report
========================================
This will send anonymized usage data to help prioritize RTK filters.
DATA TO BE SENT:
Device: a1b2c3d4e5f6 [pseudonymous hash]
Version: 0.28.0
OS/Arch: macos/aarch64
Sessions: 23
Commands: 412
Unhandled (top 8):
curl 45
terraform 32
Missed savings (top 5):
git status 67 → rtk git
cargo test 23 → rtk cargo
WHAT IS NOT SENT:
- No file paths or directory names
- No command arguments or flags
- No example commands or output
Payload JSON:
{
"device_hash": "a1b2c3...",
"version": "0.28.0",
"os": "macos",
"arch": "aarch64",
"report": { /* AnonymizedReport */ }
}Also: make generate_device_hash() in telemetry.rs pub (reuse for share payload).
Phase 4: Telemetry aggregate
File: src/telemetry.rs
Enrich the existing daily ping with lightweight discover aggregate:
{
"discover": {
"unhandled_top10": ["curl", "terraform", "helm"],
"missed_top10": ["git status", "cargo test"],
"sessions_scanned": 23
}
}Implementation:
get_discover_summary()runs classify-only scan (last 7 days, current project, no output_len analysis)- Returns
serde_json::Value::Nullon any error — never breaks the ping - Caches result in
.discover_cachemarker file (skip if <23h old) - Runs in the existing background thread
Phase 5: Backend — Cloudflare Worker + D1
New repo/subdirectory: rtk-share-worker/
D1 Schema
CREATE TABLE discover_reports (
id INTEGER PRIMARY KEY AUTOINCREMENT,
device_hash TEXT NOT NULL,
version TEXT NOT NULL,
os TEXT NOT NULL,
arch TEXT NOT NULL,
sessions_scanned INTEGER,
total_commands INTEGER,
already_rtk INTEGER,
report_json TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX idx_device_hash ON discover_reports(device_hash);
CREATE INDEX idx_created_at ON discover_reports(created_at);
CREATE TABLE command_rankings (
command TEXT PRIMARY KEY,
total_count INTEGER DEFAULT 0,
unique_devices INTEGER DEFAULT 0,
score REAL DEFAULT 0, -- total_count * log(unique_devices + 1)
category TEXT DEFAULT 'unhandled',
last_seen TEXT
);Endpoints
POST /api/v1/discover-share— upsert report, update rankings, dedup by device_hash per dayGET /api/v1/community-stats— top 30 unhandled + top 30 missed, public, cached 1h
Scoring formula
score = total_count × log(unique_devices + 1)
Balances "one user runs curl 500x" vs "50 users each run terraform 5x" — the latter scores higher.
Phase 6: Agent-Assisted Filter Creation Pipeline
Closes the loop between "we know what's missing" (discover) and "someone writes the filter". Three components that let a contributor go from raw command output to a merged PR with minimal friction.
6.1 — rtk analyze <cmd>
Purpose: Analyze command output and recommend a filter implementation approach (TOML stages vs Rust module), without blindly executing anything.
Input modes (safety-first):
- stdin:
helm list | rtk analyze helm - Fixture file:
rtk analyze helm --fixture tests/fixtures/helm_list_raw.txt - Explicit opt-in:
rtk analyze helm --run(executes the command and captures output)
Default is stdin/fixture — --run is an explicit opt-in, never automatic.
What it does:
- Detects output format (JSON, NDJSON, tabular, free-form text, mixed)
- Measures repetition ratio (high repetition → strong TOML candidate)
- Checks for structured fields vs unstructured prose
- Recommends TOML or Rust with a justification sentence
- If TOML: suggests which stages apply (
strip_lines,keep_sections,truncate, etc.) - If Rust: notes which existing module is the closest template
Decision tree (TOML vs Rust):
Output format?
├── JSON or NDJSON → Rust (structured parsing, serde)
├── Tabular (columns, headers) → TOML likely enough
│ ├── Static columns → TOML (strip_lines + keep_columns)
│ └── Dynamic/nested → Rust
├── Free-form text with sections → TOML (keep_sections + strip_lines)
└── Mixed (text + JSON blobs) → Rust
Repetition ratio?
├── >60% repeated lines → TOML (dedup stage)
└── <60% → Rust (logic needed)
Token savings estimate?
├── TOML stages alone achieve ≥60% → recommend TOML
└── Below 60% threshold → recommend Rust
Flags:
--run— execute the command and capture output (opt-in)--fixture <path>— analyze output from a fixture file--save-fixture <path>— save captured output to a fixture file--json— machine-readable output (for agent consumption)--verbose— show line-by-line analysis details
Example output:
rtk analyze helm list
Input: 847 tokens (stdin)
Format: tabular (7 columns detected)
Repetition: 12% (low)
Estimated savings with TOML: 71%
Recommendation: TOML filter
Rationale: Tabular output with static columns. TOML stages sufficient.
Suggested stages:
- strip_lines: [regex for empty/separator lines]
- keep_columns: [NAME, NAMESPACE, STATUS, CHART]
- truncate: max_lines=50
Next step: rtk analyze helm list --save-fixture tests/fixtures/helm_list_raw.txt
6.2 — /create-filter slash command (Claude Code)
File: .claude/commands/create-filter.md
A Claude Code slash command contributors run in their local clone of RTK. Takes a command name, orchestrates the full filter creation loop.
Flow:
- Runs
rtk analyze <cmd>(or prompts for a fixture if no stdin) - Scaffolds the filter: TOML file in
.rtk/filters/or Rust module insrc/<cmd>_cmd.rs - Creates a test fixture if not already present
- Writes unit tests (snapshot + token savings assertion)
- Runs
cargo fmt && cargo clippy && cargo test— up to 3 retry loops if failures - On success: commits, detects the contributor's fork remote, pushes, opens a PR toward
rtk-ai/rtk:develop
Fork-aware PR creation:
- Detects
originvsupstreamremotes (standard fork setup) - Pushes to
origin(contributor's fork), opens PR towardupstream(rtk-ai/rtk) - If only one remote: assumes it's the fork, warns if it looks like the main repo
- PR title follows RTK convention:
feat: add <cmd> filter (X% token savings)
Retry loop (max 3 iterations):
cargo test fails
→ Claude reads error output
→ Fixes the filter or test
→ Re-runs cargo test
→ If still failing after 3 attempts: stops, reports what's stuck
Contributor requirements: local Rust toolchain + Claude Code + their own API credits. No special RTK infrastructure needed.
6.3 — rtk discover --suggest
Purpose: Enrich the unhandled commands table with a "Recommendation" column, reusing heuristics from analyze on already-captured output — no re-execution of commands.
Output (new column in the existing unhandled table):
| Command | Sessions | Count | Recommendation |
|---|---|---|---|
| helm | 8 | 234 | TOML (tabular, 71% est.) |
| terraform | 5 | 156 | Rust (JSON output) |
| kubectl logs | 3 | 89 | TOML (repetitive lines, 80% est.) |
Implementation:
- Reuses
output_contentalready stored by discover (no re-execution) - Applies the TOML vs Rust decision tree from
analyzeheuristics - Falls back to "Unknown" if no output was captured for that command
- Adds savings estimate when output sample is available
Files added/modified (Phase 6)
| File | Change |
|---|---|
src/analyze_cmd.rs |
New — rtk analyze implementation |
src/main.rs |
Add Commands::Analyze variant |
src/discover/mod.rs |
Add --suggest recommendation column |
.claude/commands/create-filter.md |
New — slash command for contributors |
Files modified (Phases 1-5)
| File | Change |
|---|---|
src/discover/report.rs |
AnonymizedReport structs + anonymize_report() + format_github_issue() |
src/discover/share.rs |
New — HTTP share logic, preview, confirmation |
src/discover/mod.rs |
mod share, route --issue/--share, truncate_command → pub(crate) |
src/main.rs |
--issue and --share flags on Commands::Discover |
src/telemetry.rs |
generate_device_hash() → pub, discover field in ping payload |
rtk-share-worker/ |
New — Cloudflare Worker + D1 backend |
Reused functions
| Function | File | Reused for |
|---|---|---|
generate_device_hash() |
src/telemetry.rs:82 |
Device ID in share payload |
truncate_command() |
src/discover/mod.rs:230 |
Base command extraction |
classify_command() |
src/discover/registry.rs |
Telemetry aggregate |
split_command_chain() |
src/discover/registry.rs |
Telemetry aggregate |
ClaudeProvider |
src/discover/provider.rs |
Session scanning |
Privacy controls
| Control | Effect |
|---|---|
telemetry.enabled = false in config.toml |
Disables Phase 4 telemetry aggregate |
RTK_TELEMETRY_DISABLED=1 |
Disables Phase 4 |
--share always requires interactive y/N |
Phase 3 always explicit |
| Preview shows exact payload before sending | Full transparency |
| No args/paths/examples ever sent | All phases |
rtk analyze never executes commands without --run |
Phase 6 safety |
Unit tests to write
// report.rs
test_anonymize_strips_examples() // UnsupportedEntry.example dropped
test_anonymize_strips_args() // "git log --oneline -20" → "git log"
test_anonymize_strips_paths() // "/usr/bin/grep -r foo" → "grep"
test_anonymize_preserves_counts() // counts unchanged
test_format_github_issue_valid_md() // output is valid markdown table
test_format_github_issue_no_paths() // no "/" in output except markdown syntax
// share.rs
test_share_payload_serializes() // SharePayload → valid JSON
test_share_no_url_returns_ok() // graceful when RTK_SHARE_URL not set
// telemetry.rs
test_discover_summary_null_on_error() // returns null, doesn't panic
// analyze_cmd.rs
test_analyze_detects_json_format() // JSON input → Rust recommendation
test_analyze_detects_tabular_format() // tabular input → TOML recommendation
test_analyze_high_repetition_toml() // >60% repetition → TOML recommendation
test_analyze_json_output_flag() // --json produces valid JSON
test_analyze_stdin_no_run() // no --run = no command executionVerification checklist
-
cargo fmt --all && cargo clippy --all-targets && cargo test -
rtk discover --issue→ valid markdown, no paths/args in output -
rtk discover --share→ preview shown, y/N works, POST succeeds -
rtk discover --suggest→ Recommendation column appears, no command re-execution -
helm list | rtk analyze helm→ recommendation + suggested TOML stages -
rtk analyze helm --run→ executes helm, captures output, analyzes -
/create-filter helmin Claude Code → scaffolds filter, tests pass, PR opened -
hyperfine 'rtk discover'before/after (no regression) -
wrangler dev→ POST payload → GET community-stats → rankings correct