Skip to content

feat: replace custom filter engine with tokf-filter crate#577

Open
mpecan wants to merge 1 commit intortk-ai:masterfrom
mpecan:feat/tokf-filter-engine
Open

feat: replace custom filter engine with tokf-filter crate#577
mpecan wants to merge 1 commit intortk-ai:masterfrom
mpecan:feat/tokf-filter-engine

Conversation

@mpecan
Copy link

@mpecan mpecan commented Mar 13, 2026

Summary

Hey — I'm the maintainer of tokf. I built tokf because I wanted a configurable, locally-definable filter pipeline for reducing LLM token consumption from CLI output. RTK was an inspiration — I credited it in the README from day one — but tokf's filter engine and TOML DSL predate RTK's TOML filter engine by about three weeks (tokf's filter pipeline landed Feb 18; RTK's TOML Part 1 landed Mar 10).

When RTK added its own TOML engine, the two projects ended up with very similar designs — same core stages (skip/keep, replace, match_output, truncate, head/tail, on_empty), similar TOML schemas. Rather than let the two implementations drift apart, I added RTK format compatibility to tokf's serde layer so RTK's field names (strip_lines_matching, keep_lines_matching, head_lines, tail_lines, message) all deserialize natively.

This PR replaces RTK's custom filter pipeline with a delegation to tokf-filter::apply(). RTK keeps everything that makes it RTK — the registry, command matching, build.rs concatenation, rtk verify, omission markers — but the actual line-by-line filtering is now handled by the shared library.

What changes

  • Filter execution delegates to tokf-filter::apply() (net -35 lines of pipeline code)
  • RTK's registry, matching, and TOML parsing are unchanged — [filters.name] + match_command, build.rs, filter priority chain, rtk verify, RTK_NO_TOML / RTK_TOML_DEBUG all work exactly as before
  • Omission markers ("... (N lines omitted)", "... (N lines truncated)") are still applied by RTK as post-processing
  • 7 pre-existing verify test failures fixed (filters with on_empty + empty input expected "" instead of the on_empty message — these were broken on master before this PR)

What this unlocks for RTK users

After this lands, anyone writing .rtk/filters.toml or built-in filters gains access to tokf's full feature set — without any breaking changes to existing filters:

  • [[section]] state machines for collecting failure blocks
  • [[chunk]] for splitting output into repeating blocks with aggregation
  • [on_success] / [on_failure] branches with templates
  • dedup / dedup_window for collapsing duplicate lines
  • [json] extraction via JSONPath
  • Template pipe operations (| each:, | join:, | truncate:, | keep:)

Backward compatibility

  • All 890 unit tests pass
  • All 111/111 inline verify tests pass (rtk verify --require-all)
  • cargo fmt --all --check && tokf run cargo clippy --all-targets clean
  • No changes to any .toml filter's logic (7 test expectation fixes only)
  • One cosmetic difference: truncate_lines_at now uses (unicode ellipsis) instead of ... (3 ASCII dots)

Dependencies added

  • tokf-filter = "0.2.33" (no default features, Lua disabled — minimal binary size impact)
  • tokf-common = "0.2.33" (shared config types)

Benchmark results

Metric Master (no tokf) With tokf Delta
Startup time 13.1ms ± 0.6ms 15.2ms ± 1.3ms +2.1ms
Binary size 5.4MB 5.6MB +0.2MB

Note: The 10ms startup target was already exceeded on current master (13.1ms) — this predates this PR. The tokf dependency adds ~2ms and 0.2MB, which I consider acceptable for the capabilities gained. Happy to investigate optimization opportunities if the maintainers feel differently.

Motivation

I don't want two near-identical filter engines maintained in parallel. By sharing the core pipeline, bug fixes and new features in tokf automatically benefit RTK, and RTK's extensive filter library (47 built-in filters with 111 inline tests) has already helped me find and fix bugs in tokf — like match_output not respecting strip_ansi. The ecosystems are stronger together.

Test plan

  • cargo fmt --all --check — clean
  • cargo clippy --all-targets — clean
  • cargo test --all — 890 passed, 0 failed
  • rtk verify --require-all — 111/111 passed
  • Benchmark startup time with hyperfine — 15.2ms (+2.1ms over master)
  • Binary size check — 5.6MB (+0.2MB over master)
  • Manual smoke test: rtk make --version, rtk git log -5, rtk ping -c 2 localhost

🤖 Generated with Claude Code

Delegate RTK's 8-stage filter pipeline to tokf-filter::apply() while
keeping the registry, command matching, build.rs concatenation, rtk verify,
and omission markers unchanged. Unlocks tokf's full feature set (sections,
chunks, JSON extraction, templates) for .rtk/filters.toml authors.

- All 890 unit tests pass
- All 111/111 inline verify tests pass
- 7 pre-existing verify test failures fixed (on_empty + empty input)
- One cosmetic change: truncate_lines_at uses unicode ellipsis (…)
- +2.1ms startup overhead, +0.2MB binary size

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Alorse
Copy link

Alorse commented Mar 13, 2026

@mpecan This is a great PR that unlocks a lot of potential for RTK! I wanted to suggest an enhancement that would be incredibly valuable for MCP tool users.

Use Case: MCP-Specific Filters

With the rise of MCP (Model Context Protocol) tools, there's a growing need to filter verbose JSON responses from external tools that RTK doesn't natively support. For example, the ClickUp MCP returns massive JSON payloads with fields like workspace_id, creator, custom_fields.type_config, etc. that consume tokens but are rarely relevant.

Proposed Enhancement

Would it be possible to extend the TOML DSL to support conditional filters based on MCP tool name patterns? Something like:

[[filters.clickup]]
match_mcp = "mcp__clickup__.*"

[filters.clickup.json]
# JSONPath extraction for specific fields
extract = "{tasks: [.tasks[] | {id, name, status, assignee: .assignees[0].username}]}"

# Or field exclusion
exclude_paths = [
    "$..workspace_id",
    "$..creator",
    "$..custom_fields[*].type_config",
    "$..assignees[*].profilePicture"
]

# Array limits
max_array_items = 10

# Field truncation
[filters.clickup.json.truncate]
description = 200
markdown_description = 0  # 0 = remove entirely

Why This Matters

  1. No code changes needed: Users can add filters for new MCP tools without waiting for RTK releases
  2. Community sharing: Users could share .toml filter packs for popular MCPs (ClickUp, Slack, Notion, etc.)
  3. Complements PR feat: Compress MCP tool output via PostToolUse hook #535: While feat: Compress MCP tool output via PostToolUse hook #535 adds generic MCP compression, this would enable semantic filtering per tool

Implementation Ideas

The filter could be triggered by:

  • A new match_mcp pattern in the TOML
  • Or reuse match_command with MCP-aware detection
  • The JSONPath features you mentioned in the PR description seem like they could handle the field filtering

Would this fit within the scope of tokf-filter's roadmap? Happy to help test or refine the proposal!


Related: PR #535 also addresses MCP output compression but with a generic approach. These two PRs could work beautifully together—#535 for generic truncation, and this proposal for tool-specific semantic filtering.

@mpecan
Copy link
Author

mpecan commented Mar 13, 2026

@Alorse The matching is done fully in RTK, so the filter that is used is completely independent from the tokf implementation, this only adds the Filter layer.

That said: the fact above makes it easier to add MCP matching into the tool and reuse the JSON capabilities from tokf.

Just to make sure: what I am trying to say is that the use-case you are proposing doesn't require any change to tokf-filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants