-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Problem (one or two sentences)
While Roo Code's mode-based tool filtering works well (e.g., Ask mode excludes editing tools, Orchestrator has minimal tools), each mode still provides ALL its available tools regardless of the specific query context, Code/Debug modes provide all 19 tools whether the user asks "what does this function do?" or "refactor the entire authentication system," leading to unnecessary context overhead and potential tool selection confusion.
Recent research demonstrates that providing too many tools to LLMs degrades performance:
- ToolScope: Enhancing LLM Agent Tool Use (Liu et al., 2025): Shows that redundant tools with overlapping descriptions introduce ambiguity and reduce selection accuracy. The authors demonstrate significant improvements through context-aware tool filtering - exactly what this enhancement proposes.
Context (who is affected and when)
This affects users working within tool-rich modes (Code, Debug, Architect) when performing focused tasks that only need a subset of the mode's capabilities:
- Reading code: User asks "explain this function" in Code mode but receives editing, command, and browser tools they won't need
- Simple edits: User says "fix this typo" but gets the full 19-tool suite including browser automation and MCP integrations
- Diagnostic phase: User debugging an issue needs read-only tools initially, but gets editing tools before diagnosis is complete
- Token efficiency: Users with smaller context window models or complex codebases where every token counts
Current architecture (working well):
- ✅ Mode-level filtering: As documented in
packages/types/src/mode.ts, each mode has specific tool groups (Ask mode: read/browser/mcp only; Code mode: all groups) - ✅ Static filtering: Tools are correctly filtered per mode in
src/core/prompts/tools/index.ts
What's missing:
- ❌ Query-context filtering: Within a mode, no adaptation based on what the user is actually asking
- ❌ Conversation-aware selection: Tools don't adapt as the conversation evolves (e.g., from diagnosis → implementation)
Desired behavior (conceptual, not technical)
Roo Code should add a second layer of intelligent tool selection that works within each mode to provide only the 6-10 most relevant tools for the specific query, while maintaining the existing mode-based boundaries.
Key principles:
- Preserve mode safety: Never violate mode constraints (Ask mode still can't edit files, even with smart selection)
- Query-aware: Analyze the user's specific request to determine which subset of the mode's tools are needed
- Conversation-aware: Adapt tool availability as the task evolves (expand toolset when moving from reading to editing)
- Explicit tool requests: When users mention specific tools or capabilities (e.g., "use the GitHub MCP server"), immediately include those tools
- Always include essentials: Workflow tools like
ask_followup_questionandattempt_completionalways available - Fail-safe: If uncertain, provide more tools rather than fewer; fall back to current mode-based approach on errors
Example scenarios (all in Code mode):
| User Query | Current (16-20 tools) | With Smart Selection (6-10 tools) | Reasoning |
|---|---|---|---|
| "What does this auth function do?" | read + edit + command + browser + mcp tools | read tools + essentials only | Read-only query doesn't need editing/command/browser/mcp |
| "Fix the typo in line 42" | All 16-20 tools | read tools + apply_diff + essentials | Simple edit needs reading + one edit tool |
| "Debug the API timeout issue" | All 16-20 tools initially | Start with read + command, expand to edit as diagnosis progresses | Adaptive: diagnosis phase → fix phase |
| "Refactor the entire auth system" | All 16-20 tools | All relevant tools (12-15 tools) | High complexity query needs fuller toolset |
| "Use the GitHub MCP server to create a PR" | All 16-20 tools | read tools + use_mcp_tool + access_mcp_resource + essentials | User explicitly mentions MCP, so MCP tools are included |
| "Check the browser at localhost:3000 and fix any UI bugs" | All 16-20 tools | read + edit + browser_action + essentials | Browser explicitly mentioned, so include it |
Note: Code/Debug modes expose 16-20 tools depending on enabled features (base: 14 core tools + 6 workflow tools, with optional additions like codebase_search, generate_image, and run_slash_command). Ask mode has 9-13 tools, Orchestrator has 6 tools.
Key insight for MCP/Browser/Specific Tools:
When users explicitly mention:
- MCP servers by name (e.g., "use GitHub MCP", "check the database MCP")
- Browser/web interaction (e.g., "test in browser", "check the website")
- Command execution (e.g., "run the tests", "install dependencies")
- Specific tool capabilities
The smart selection should immediately recognize these keywords and ensure those tool groups are included, even if they wouldn't normally be selected for that query type.
Constraints / preferences (optional)
No response
Request checklist
- I've searched existing Issues and Discussions for duplicates
- This describes a specific problem with clear context and impact
Roo Code Task Links (optional)
No response
Acceptance criteria (optional)
No response
Proposed approach (optional)
This enhancement would add a smart selection layer between mode filtering and tool presentation:
Current Flow:
User Query → Mode Selection → Mode's Tool Groups → All Mode Tools → LLM
Enhanced Flow:
User Query → Mode Selection → Mode's Tool Groups → Smart Selection → Relevant Subset (6-10) → LLM
Three potential implementation approaches:
Option 1: Two-Stage Hierarchical Selection
Within each mode, ask a lightweight LLM: "For this query, which tool categories are needed: read-only, editing, command execution, or web interaction?"
- Pros: Leverages existing group structure, no new infrastructure, predictable
- Cons: Adds LLM call latency (~50-100ms), somewhat coarse-grained
Option 2: RAG-Based Selection
Embed tool descriptions once at startup, embed each query, select top-k tools by semantic similarity
- Pros: Best semantic matching, no LLM calls, fast (<50ms)
- Cons: Requires embedding infrastructure (model + vector similarity)
Option 3: Hybrid
- Start with mode's allowed tools (existing system)
- Analyze recent conversation context (are we reading or editing?)
- Use lightweight semantic matching or heuristics to rank tools
- Select top 8-12 based on query complexity
- Always include mode's essential tools
- Pros: Balances intelligence with safety, incremental implementation path
- Cons: Most complex but most robust
Trade-offs / risks (optional)
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status