Feat/fix issue 1028#1349
Open
ilonae wants to merge 4 commits into
Open
Conversation
…tuck loop during browser operations This fix addresses issue FoundationAgents#1028 where agents get stuck in a loop when attempting web searches or browser operations with unavailable Playwright. **1. Enhanced Error Classification in BrowserUseTool** (app/tool/browser_use_tool.py) - Added _classify_error() method to distinguish between: * Playwright initialization errors (fatal - switch to web_search) * Operation failures (may be recoverable) - Wrapped browser initialization in separate try-catch for better error handling - Replaced generic exception handler with categorized error responses - Result: Agent receives clear signal when browser is unavailable **2. Enhanced Stuck-State Detection in BaseAgent** (app/agent/base.py) - Expanded is_stuck() from simple duplicate detection to multi-criteria: * Criterion 1: Exact duplicate messages (existing) * Criterion 2: 3+ error messages in recent history * Criterion 3: Repeated error patterns in tool observations - Updated handle_stuck_state() to guide agent away from retrying same tools - Result: Agent detects stuck states earlier and attempts recovery strategies **3. Tool Failure Tracking in ToolCallAgent** (app/agent/toolcall.py) - Added _tool_failures dict (using PrivateAttr) to track consecutive failures per tool - Added helper methods: _increment_tool_failure, _reset_tool_failure, _get_tool_failure_count - Modified observe_tool_results() to: * Track failures when tool returns errors * Reset counter on success * Alert agent after 3 consecutive failures - Result: Agent recognizes when a tool is broken and tries alternatives **4. System Prompt Updates** (app/prompt/manus.py) - Added explicit guidance for handling browser initialization errors - Documented when to switch away from failing tools - Clarified that repeated failures indicate unrecoverable errors - Result: Agent behavior guided toward recovery strategies instead of retries Previously, when Playwright browser binary was unavailable, the agent would: 1. Receive generic error from browser_use tool 2. Not recognize it as a terminal/fatal error 3. Attempt browser operations repeatedly 4. Fail to detect stuck state (errors weren't exact duplicates) 5. Loop until max steps exceeded Now the agent: 1. Receives clear "Browser initialization failed" message 2. Detects stuck state via error pattern recognition 3. Recognizes tool has failed 3+ times consecutively 4. Switches to web_search or other alternative tools 5. Completes task without getting stuck - Error classification: Correctly identifies Playwright initialization errors - Stuck-state detection: Detects multiple errors and exact duplicates - Tool failure tracking: Correctly tracks and resets failure counts per tool - All modified files compile successfully with Python 3.12 and Pydantic v2
…and improve error handling guidance Critical fixes for agent stuck loop when Playwright browser fails: 1. **Remove prompt override in Manus.think()** (app/agent/manus.py) - Manus was overriding the ToolCallAgent prompt with BrowserAgent's JSON response format - This caused LLM to output JSON instead of tool selections when browser was used - This prevented the agent from switching to alternative tools after browser failures - Solution: Use consistent ToolCallAgent prompt so tool selection works reliably 2. **Enhance system prompt with clear recovery strategies** (app/prompt/manus.py) - Previous prompt said "use web_search tool" but that's an action within browser_use - When browser_use fails (Playwright missing), web_search action also fails - New prompt clearly lists available alternatives: python_execute, ask_human, MCP tools - Explicit guidance on tool failure recovery and when to stop retrying Technical details: - BrowserAgent expects JSON with action/state format - Manus/ToolCallAgent expects tool function calls - Mixing these formats confuses the LLM response parsing - Removing the override ensures consistent tool selection mechanism - System prompt now gives concrete alternative tools, not dead ends This combined fix enables the agent to: - Detect browser initialization failures - Switch to python_execute (requests/urllib) or ask_human - Not get stuck in retry loops when tools fail - Properly utilize tool failure tracking already in place
c53f183 to
024270c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Agent gets stuck in an infinite loop when attempting job searches due to Playwright browser initialization failure.
Fixes #1028
Root Cause Analysis
The issue had 5 interconnected layers:
is_stuck()only detected exact duplicate messages, missing repeated error patternsSolutions Implemented
1. Error Classification in BrowserUseTool
Added
_classify_error()method to categorize errors:Detects Playwright-specific patterns:
Impact: Browser failures are now clearly identified as fatal, triggering recovery mechanisms.
2. Multi-Criteria Stuck Detection (BaseAgent.is_stuck)
Enhanced detection from single criterion to three independent criteria:
Impact: Catches stuck states caused by error loops, not just duplicates.
3. Tool Failure Tracking (ToolCallAgent)
Added
_tool_failuresdict with:_increment_tool_failure()- Track consecutive failures per tool_reset_tool_failure()- Reset counter on successUses Pydantic v2 compatible PrivateAttr for private attributes.
Impact: Prevents infinite retries of broken tools; guides toward alternatives.
4. Enhanced System Prompt (NEXT_STEP_PROMPT)
Updated guidance with:
python_executewith requests/urllibask_humanfor user assistanceImpact: Prevents dead-ends; guides LLM toward viable recovery paths.
5. CRITICAL - Removed Prompt Override Bug (Manus.think)
The Bug: Manus.think() was switching to BrowserAgent's JSON response format, breaking the ToolCallAgent interface.
The Fix: Removed the entire prompt override block. Manus now uses standard ToolCallAgent prompts.
Impact: Agent can now properly switch between tools when one fails.
Files Modified
app/agent/base.pyapp/agent/toolcall.pyapp/agent/manus.pyapp/prompt/manus.pyapp/tool/browser_use_tool.pyrequirements.txtTotal: 143 insertions, 28 deletions
Testing
Comprehensive test coverage created:
test_issue_1028_standalone.py - Standalone logic validation (no dependencies)
test_stuck_detection.py - Unit test for stuck detection (fixed to use concrete ToolCallAgent class)
test_live_recovery.py - Integration test with actual agent code
All tests demonstrate the fix prevents infinite loops and enables proper recovery.
Expected Behavior After Fix
When browser fails during job search:
python_executeorask_humanRelated
Closes #1028
Checklist