Skip to content

Feat/fix issue 1028#1349

Open
ilonae wants to merge 4 commits into
FoundationAgents:mainfrom
ilonae:feat/fix-issue-1028
Open

Feat/fix issue 1028#1349
ilonae wants to merge 4 commits into
FoundationAgents:mainfrom
ilonae:feat/fix-issue-1028

Conversation

@ilonae
Copy link
Copy Markdown

@ilonae ilonae commented Apr 11, 2026

Issue

Agent gets stuck in an infinite loop when attempting job searches due to Playwright browser initialization failure.

Fixes #1028

Root Cause Analysis

The issue had 5 interconnected layers:

  1. Generic Error Handling - BrowserUseTool couldn't distinguish fatal initialization errors from recoverable operation errors
  2. Incomplete Stuck Detection - is_stuck() only detected exact duplicate messages, missing repeated error patterns
  3. No Tool Failure Tracking - Agent had no mechanism to count consecutive failures per tool
  4. Vague System Prompt - Guidance to use "web_search" created dead-ends since it's a browser_use sub-action
  5. CRITICAL - Prompt Override Bug - Manus.think() was switching LLM response format (JSON vs tool selection), breaking recovery

Solutions Implemented

1. Error Classification in BrowserUseTool

Added _classify_error() method to categorize errors:

  • INIT_FAILED: Playwright initialization errors (fatal, non-recoverable)
  • OPERATION_FAILED: Browser operation errors (may be recoverable)
  • ELEMENT_NOT_FOUND: Element interaction errors

Detects Playwright-specific patterns:

  • "BrowserType.launch"
  • "Executable doesn't exist"
  • "playwright install"
  • "No such file or directory"

Impact: Browser failures are now clearly identified as fatal, triggering recovery mechanisms.

2. Multi-Criteria Stuck Detection (BaseAgent.is_stuck)

Enhanced detection from single criterion to three independent criteria:

Criterion 1: Exact duplicate messages (original)
Criterion 2: 3+ error messages in last 5 messages (new)
Criterion 3: Repeated error patterns in tool observations (new)

Impact: Catches stuck states caused by error loops, not just duplicates.

3. Tool Failure Tracking (ToolCallAgent)

Added _tool_failures dict with:

  • _increment_tool_failure() - Track consecutive failures per tool
  • _reset_tool_failure() - Reset counter on success
  • Max threshold: 3 consecutive failures

Uses Pydantic v2 compatible PrivateAttr for private attributes.

Impact: Prevents infinite retries of broken tools; guides toward alternatives.

4. Enhanced System Prompt (NEXT_STEP_PROMPT)

Updated guidance with:

  • Clear Browser Error Handling: "STOP using browser_use immediately. Do NOT retry it."
  • Concrete Alternatives:
    • Use python_execute with requests/urllib
    • Use ask_human for user assistance
    • Check MCP tools (search_jobs, job_api, etc.)
  • Tool Priority When One Fails:
    • Browser fails → Try python_execute or ask_human
    • Tool fails 3+ times → Different tool entirely
    • Stuck → ask_human for help
  • Terminal Error Guidance: Playwright errors are terminal and cannot be fixed by agent

Impact: Prevents dead-ends; guides LLM toward viable recovery paths.

5. CRITICAL - Removed Prompt Override Bug (Manus.think)

The Bug: Manus.think() was switching to BrowserAgent's JSON response format, breaking the ToolCallAgent interface.

The Fix: Removed the entire prompt override block. Manus now uses standard ToolCallAgent prompts.

# Before (BROKEN):
self.next_step_prompt = "{json_format_prompt}"  # Breaks tool selection

# After (FIXED):
# Note: We intentionally do NOT override next_step_prompt here.
# When browser fails, we want to switch tools using normal mechanism.
result = await super().think()

Impact: Agent can now properly switch between tools when one fails.

Files Modified

File Changes Purpose
app/agent/base.py +46 lines Multi-criteria stuck detection
app/agent/toolcall.py +35 lines Tool failure tracking with _tool_failures dict
app/agent/manus.py -18 lines Removed prompt override bug
app/prompt/manus.py +21 lines Enhanced recovery guidance
app/tool/browser_use_tool.py +44 lines Error classification logic
requirements.txt 1 line Fixed pillow version conflict with crawl4ai

Total: 143 insertions, 28 deletions

Testing

Comprehensive test coverage created:

  1. test_issue_1028_standalone.py - Standalone logic validation (no dependencies)

    • Error classification
    • Multi-criteria stuck detection
    • Tool failure tracking
    • Recovery flow simulation
  2. test_stuck_detection.py - Unit test for stuck detection (fixed to use concrete ToolCallAgent class)

  3. test_live_recovery.py - Integration test with actual agent code

All tests demonstrate the fix prevents infinite loops and enables proper recovery.

Expected Behavior After Fix

When browser fails during job search:

  1. Browser initialization fails with clear error message
  2. Tool failure counter increments (1, 2, 3...)
  3. After 3 consecutive failures, tool is marked as broken
  4. Stuck detection identifies error pattern
  5. System prompt guides agent to alternatives
  6. Agent switches to python_execute or ask_human
  7. Task completes without infinite loop

Related

Closes #1028

Checklist

  • All 5 fixes implemented and tested
  • Code follows project conventions
  • Error messages are clear and actionable
  • Recovery mechanisms are documented
  • No breaking changes to existing API
  • Dependencies updated (pillow conflict resolved)

ilonae added 4 commits April 11, 2026 23:54
…tuck loop during browser operations

This fix addresses issue FoundationAgents#1028 where agents get stuck in a loop when attempting web searches or browser operations with unavailable Playwright.

**1. Enhanced Error Classification in BrowserUseTool** (app/tool/browser_use_tool.py)
   - Added _classify_error() method to distinguish between:
     * Playwright initialization errors (fatal - switch to web_search)
     * Operation failures (may be recoverable)
   - Wrapped browser initialization in separate try-catch for better error handling
   - Replaced generic exception handler with categorized error responses
   - Result: Agent receives clear signal when browser is unavailable

**2. Enhanced Stuck-State Detection in BaseAgent** (app/agent/base.py)
   - Expanded is_stuck() from simple duplicate detection to multi-criteria:
     * Criterion 1: Exact duplicate messages (existing)
     * Criterion 2: 3+ error messages in recent history
     * Criterion 3: Repeated error patterns in tool observations
   - Updated handle_stuck_state() to guide agent away from retrying same tools
   - Result: Agent detects stuck states earlier and attempts recovery strategies

**3. Tool Failure Tracking in ToolCallAgent** (app/agent/toolcall.py)
   - Added _tool_failures dict (using PrivateAttr) to track consecutive failures per tool
   - Added helper methods: _increment_tool_failure, _reset_tool_failure, _get_tool_failure_count
   - Modified observe_tool_results() to:
     * Track failures when tool returns errors
     * Reset counter on success
     * Alert agent after 3 consecutive failures
   - Result: Agent recognizes when a tool is broken and tries alternatives

**4. System Prompt Updates** (app/prompt/manus.py)
   - Added explicit guidance for handling browser initialization errors
   - Documented when to switch away from failing tools
   - Clarified that repeated failures indicate unrecoverable errors
   - Result: Agent behavior guided toward recovery strategies instead of retries

Previously, when Playwright browser binary was unavailable, the agent would:
1. Receive generic error from browser_use tool
2. Not recognize it as a terminal/fatal error
3. Attempt browser operations repeatedly
4. Fail to detect stuck state (errors weren't exact duplicates)
5. Loop until max steps exceeded

Now the agent:
1. Receives clear "Browser initialization failed" message
2. Detects stuck state via error pattern recognition
3. Recognizes tool has failed 3+ times consecutively
4. Switches to web_search or other alternative tools
5. Completes task without getting stuck

- Error classification: Correctly identifies Playwright initialization errors
- Stuck-state detection: Detects multiple errors and exact duplicates
- Tool failure tracking: Correctly tracks and resets failure counts per tool
- All modified files compile successfully with Python 3.12 and Pydantic v2
…and improve error handling guidance

Critical fixes for agent stuck loop when Playwright browser fails:

1. **Remove prompt override in Manus.think()** (app/agent/manus.py)
   - Manus was overriding the ToolCallAgent prompt with BrowserAgent's JSON response format
   - This caused LLM to output JSON instead of tool selections when browser was used
   - This prevented the agent from switching to alternative tools after browser failures
   - Solution: Use consistent ToolCallAgent prompt so tool selection works reliably

2. **Enhance system prompt with clear recovery strategies** (app/prompt/manus.py)
   - Previous prompt said "use web_search tool" but that's an action within browser_use
   - When browser_use fails (Playwright missing), web_search action also fails
   - New prompt clearly lists available alternatives: python_execute, ask_human, MCP tools
   - Explicit guidance on tool failure recovery and when to stop retrying

Technical details:
- BrowserAgent expects JSON with action/state format
- Manus/ToolCallAgent expects tool function calls
- Mixing these formats confuses the LLM response parsing
- Removing the override ensures consistent tool selection mechanism
- System prompt now gives concrete alternative tools, not dead ends

This combined fix enables the agent to:
- Detect browser initialization failures
- Switch to python_execute (requests/urllib) or ask_human
- Not get stuck in retry loops when tools fail
- Properly utilize tool failure tracking already in place
@ilonae ilonae force-pushed the feat/fix-issue-1028 branch from c53f183 to 024270c Compare April 20, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent stuck

1 participant