Skip to content

feat: rtk discover --share / --issue — Community filter prioritization #481

@FlorianBruniaux

Description

@FlorianBruniaux

Context

rtk discover scans Claude Code sessions locally. This issue tracks adding the ability for users to share their discover results back to the RTK project, so we can prioritize filter development based on real-world usage data rather than guessing.

Scope: 4 Rust phases + Cloudflare Worker backend, shipped together.


Phase 1: Anonymization structs + logic

File: src/discover/report.rs

Add new structs for sharing (no args, no paths, no examples):

#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct AnonymizedEntry {
    pub command: String,  // "curl", "terraform", "helm"
    pub count: usize,
}

#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct AnonymizedMissedEntry {
    pub command: String,         // "git status", "cargo test"
    pub count: usize,
    pub rtk_equivalent: String,  // "rtk git", "rtk cargo"
}

#[derive(Debug, Serialize, Deserialize)]
pub struct AnonymizedReport {
    pub sessions_scanned: usize,
    pub total_commands: usize,
    pub already_rtk: usize,
    pub unhandled: Vec<AnonymizedEntry>,
    pub missed: Vec<AnonymizedMissedEntry>,
}

Add anonymize_report(report: &DiscoverReport) -> AnonymizedReport:

  • unsupportedunhandled: keep base_command (first word only), drop example
  • supportedmissed: keep first 1-2 words of command, drop token estimates
  • Reuse existing truncate_command() from mod.rs (make it pub(crate))

Add format_github_issue(report: &AnonymizedReport, version, os, arch) -> String.


Phase 2: rtk discover --issue

Files: src/main.rs, src/discover/mod.rs

Add --issue flag to Commands::Discover. When set: anonymize → render markdown → print to stdout → exit.

Output format:

## RTK Discover Report

**Version**: 0.28.0 | **OS**: macos/aarch64 | **Sessions**: 23 | **Commands**: 412

### Top Unhandled Commands (no RTK filter yet)
| Command | Count |
|---------|-------|
| curl | 45 |
| terraform | 32 |

### Missed Savings (RTK already supports these)
| Command | Count | RTK Equivalent |
|---------|-------|----------------|
| git status | 67 | rtk git |
| cargo test | 23 | rtk cargo |

---
*Generated by `rtk discover --issue` v0.28.0*

Users can then:

rtk discover --issue | pbcopy                    # paste manually
rtk discover --issue | gh issue create --title "Discover report" --body-file -

Phase 3: rtk discover --share

New file: src/discover/share.rs
Files: src/main.rs, src/discover/mod.rs, src/telemetry.rs

Add --share flag. Flow:

  1. Run discover pipeline
  2. Anonymize report
  3. Show preview of exactly what will be sent
  4. Ask [y/N] confirmation
  5. POST to RTK_SHARE_URL (compile-time env, same pattern as telemetry)

Preview shows:

RTK Discover -- Sharing Report
========================================
This will send anonymized usage data to help prioritize RTK filters.

DATA TO BE SENT:
  Device:     a1b2c3d4e5f6 [pseudonymous hash]
  Version:    0.28.0
  OS/Arch:    macos/aarch64
  Sessions:   23
  Commands:   412

  Unhandled (top 8):
    curl                     45
    terraform                32

  Missed savings (top 5):
    git status               67  → rtk git
    cargo test               23  → rtk cargo

WHAT IS NOT SENT:
  - No file paths or directory names
  - No command arguments or flags
  - No example commands or output

Payload JSON:

{
  "device_hash": "a1b2c3...",
  "version": "0.28.0",
  "os": "macos",
  "arch": "aarch64",
  "report": { /* AnonymizedReport */ }
}

Also: make generate_device_hash() in telemetry.rs pub (reuse for share payload).


Phase 4: Telemetry aggregate

File: src/telemetry.rs

Enrich the existing daily ping with lightweight discover aggregate:

{
  "discover": {
    "unhandled_top10": ["curl", "terraform", "helm"],
    "missed_top10": ["git status", "cargo test"],
    "sessions_scanned": 23
  }
}

Implementation:

  • get_discover_summary() runs classify-only scan (last 7 days, current project, no output_len analysis)
  • Returns serde_json::Value::Null on any error — never breaks the ping
  • Caches result in .discover_cache marker file (skip if <23h old)
  • Runs in the existing background thread

Phase 5: Backend — Cloudflare Worker + D1

New repo/subdirectory: rtk-share-worker/

D1 Schema

CREATE TABLE discover_reports (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  device_hash TEXT NOT NULL,
  version TEXT NOT NULL,
  os TEXT NOT NULL,
  arch TEXT NOT NULL,
  sessions_scanned INTEGER,
  total_commands INTEGER,
  already_rtk INTEGER,
  report_json TEXT NOT NULL,
  created_at TEXT DEFAULT (datetime('now'))
);

CREATE INDEX idx_device_hash ON discover_reports(device_hash);
CREATE INDEX idx_created_at ON discover_reports(created_at);

CREATE TABLE command_rankings (
  command TEXT PRIMARY KEY,
  total_count INTEGER DEFAULT 0,
  unique_devices INTEGER DEFAULT 0,
  score REAL DEFAULT 0,   -- total_count * log(unique_devices + 1)
  category TEXT DEFAULT 'unhandled',
  last_seen TEXT
);

Endpoints

  • POST /api/v1/discover-share — upsert report, update rankings, dedup by device_hash per day
  • GET /api/v1/community-stats — top 30 unhandled + top 30 missed, public, cached 1h

Scoring formula

score = total_count × log(unique_devices + 1)

Balances "one user runs curl 500x" vs "50 users each run terraform 5x" — the latter scores higher.


Phase 6: Agent-Assisted Filter Creation Pipeline

Closes the loop between "we know what's missing" (discover) and "someone writes the filter". Three components that let a contributor go from raw command output to a merged PR with minimal friction.

6.1 — rtk analyze <cmd>

Purpose: Analyze command output and recommend a filter implementation approach (TOML stages vs Rust module), without blindly executing anything.

Input modes (safety-first):

  • stdin: helm list | rtk analyze helm
  • Fixture file: rtk analyze helm --fixture tests/fixtures/helm_list_raw.txt
  • Explicit opt-in: rtk analyze helm --run (executes the command and captures output)

Default is stdin/fixture — --run is an explicit opt-in, never automatic.

What it does:

  1. Detects output format (JSON, NDJSON, tabular, free-form text, mixed)
  2. Measures repetition ratio (high repetition → strong TOML candidate)
  3. Checks for structured fields vs unstructured prose
  4. Recommends TOML or Rust with a justification sentence
  5. If TOML: suggests which stages apply (strip_lines, keep_sections, truncate, etc.)
  6. If Rust: notes which existing module is the closest template

Decision tree (TOML vs Rust):

Output format?
├── JSON or NDJSON → Rust (structured parsing, serde)
├── Tabular (columns, headers) → TOML likely enough
│   ├── Static columns → TOML (strip_lines + keep_columns)
│   └── Dynamic/nested → Rust
├── Free-form text with sections → TOML (keep_sections + strip_lines)
└── Mixed (text + JSON blobs) → Rust

Repetition ratio?
├── >60% repeated lines → TOML (dedup stage)
└── <60% → Rust (logic needed)

Token savings estimate?
├── TOML stages alone achieve ≥60% → recommend TOML
└── Below 60% threshold → recommend Rust

Flags:

  • --run — execute the command and capture output (opt-in)
  • --fixture <path> — analyze output from a fixture file
  • --save-fixture <path> — save captured output to a fixture file
  • --json — machine-readable output (for agent consumption)
  • --verbose — show line-by-line analysis details

Example output:

rtk analyze helm list

Input: 847 tokens (stdin)
Format: tabular (7 columns detected)
Repetition: 12% (low)
Estimated savings with TOML: 71%

Recommendation: TOML filter
Rationale: Tabular output with static columns. TOML stages sufficient.

Suggested stages:
  - strip_lines: [regex for empty/separator lines]
  - keep_columns: [NAME, NAMESPACE, STATUS, CHART]
  - truncate: max_lines=50

Next step: rtk analyze helm list --save-fixture tests/fixtures/helm_list_raw.txt

6.2 — /create-filter slash command (Claude Code)

File: .claude/commands/create-filter.md

A Claude Code slash command contributors run in their local clone of RTK. Takes a command name, orchestrates the full filter creation loop.

Flow:

  1. Runs rtk analyze <cmd> (or prompts for a fixture if no stdin)
  2. Scaffolds the filter: TOML file in .rtk/filters/ or Rust module in src/<cmd>_cmd.rs
  3. Creates a test fixture if not already present
  4. Writes unit tests (snapshot + token savings assertion)
  5. Runs cargo fmt && cargo clippy && cargo test — up to 3 retry loops if failures
  6. On success: commits, detects the contributor's fork remote, pushes, opens a PR toward rtk-ai/rtk:develop

Fork-aware PR creation:

  • Detects origin vs upstream remotes (standard fork setup)
  • Pushes to origin (contributor's fork), opens PR toward upstream (rtk-ai/rtk)
  • If only one remote: assumes it's the fork, warns if it looks like the main repo
  • PR title follows RTK convention: feat: add <cmd> filter (X% token savings)

Retry loop (max 3 iterations):

cargo test fails
  → Claude reads error output
  → Fixes the filter or test
  → Re-runs cargo test
  → If still failing after 3 attempts: stops, reports what's stuck

Contributor requirements: local Rust toolchain + Claude Code + their own API credits. No special RTK infrastructure needed.


6.3 — rtk discover --suggest

Purpose: Enrich the unhandled commands table with a "Recommendation" column, reusing heuristics from analyze on already-captured output — no re-execution of commands.

Output (new column in the existing unhandled table):

Command Sessions Count Recommendation
helm 8 234 TOML (tabular, 71% est.)
terraform 5 156 Rust (JSON output)
kubectl logs 3 89 TOML (repetitive lines, 80% est.)

Implementation:

  • Reuses output_content already stored by discover (no re-execution)
  • Applies the TOML vs Rust decision tree from analyze heuristics
  • Falls back to "Unknown" if no output was captured for that command
  • Adds savings estimate when output sample is available

Files added/modified (Phase 6)

File Change
src/analyze_cmd.rs Newrtk analyze implementation
src/main.rs Add Commands::Analyze variant
src/discover/mod.rs Add --suggest recommendation column
.claude/commands/create-filter.md New — slash command for contributors

Files modified (Phases 1-5)

File Change
src/discover/report.rs AnonymizedReport structs + anonymize_report() + format_github_issue()
src/discover/share.rs New — HTTP share logic, preview, confirmation
src/discover/mod.rs mod share, route --issue/--share, truncate_commandpub(crate)
src/main.rs --issue and --share flags on Commands::Discover
src/telemetry.rs generate_device_hash()pub, discover field in ping payload
rtk-share-worker/ New — Cloudflare Worker + D1 backend

Reused functions

Function File Reused for
generate_device_hash() src/telemetry.rs:82 Device ID in share payload
truncate_command() src/discover/mod.rs:230 Base command extraction
classify_command() src/discover/registry.rs Telemetry aggregate
split_command_chain() src/discover/registry.rs Telemetry aggregate
ClaudeProvider src/discover/provider.rs Session scanning

Privacy controls

Control Effect
telemetry.enabled = false in config.toml Disables Phase 4 telemetry aggregate
RTK_TELEMETRY_DISABLED=1 Disables Phase 4
--share always requires interactive y/N Phase 3 always explicit
Preview shows exact payload before sending Full transparency
No args/paths/examples ever sent All phases
rtk analyze never executes commands without --run Phase 6 safety

Unit tests to write

// report.rs
test_anonymize_strips_examples()       // UnsupportedEntry.example dropped
test_anonymize_strips_args()            // "git log --oneline -20" → "git log"
test_anonymize_strips_paths()           // "/usr/bin/grep -r foo" → "grep"
test_anonymize_preserves_counts()       // counts unchanged
test_format_github_issue_valid_md()     // output is valid markdown table
test_format_github_issue_no_paths()     // no "/" in output except markdown syntax

// share.rs
test_share_payload_serializes()         // SharePayload → valid JSON
test_share_no_url_returns_ok()          // graceful when RTK_SHARE_URL not set

// telemetry.rs
test_discover_summary_null_on_error()   // returns null, doesn't panic

// analyze_cmd.rs
test_analyze_detects_json_format()      // JSON input → Rust recommendation
test_analyze_detects_tabular_format()   // tabular input → TOML recommendation
test_analyze_high_repetition_toml()     // >60% repetition → TOML recommendation
test_analyze_json_output_flag()         // --json produces valid JSON
test_analyze_stdin_no_run()             // no --run = no command execution

Verification checklist

  • cargo fmt --all && cargo clippy --all-targets && cargo test
  • rtk discover --issue → valid markdown, no paths/args in output
  • rtk discover --share → preview shown, y/N works, POST succeeds
  • rtk discover --suggest → Recommendation column appears, no command re-execution
  • helm list | rtk analyze helm → recommendation + suggested TOML stages
  • rtk analyze helm --run → executes helm, captures output, analyzes
  • /create-filter helm in Claude Code → scaffolds filter, tests pass, PR opened
  • hyperfine 'rtk discover' before/after (no regression)
  • wrangler dev → POST payload → GET community-stats → rankings correct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions