Skip to content

Fix LiteLLM thinking models with tool calling across providers #812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

knowsuchagency
Copy link

Summary

Fixes issue #765 by implementing a universal solution for LiteLLM thinking models that support function calling.

Changes

  • Enhanced LitellmModel._fix_thinking_model_messages() to work with any thinking model when reasoning is enabled
  • Added comprehensive test suite covering multiple thinking model providers
  • Verified fix works across different providers (Anthropic and OpenAI)

Verified Working

  • ✅ Anthropic Claude Sonnet 4: Shows progress from "found text" to "found tool_use" error
  • ✅ OpenAI o4-mini: Complete success with full tool calling and reasoning support

Test Coverage

  • Mock tests for error reproduction and successful scenarios
  • Real API tests with multiple thinking model providers
  • Generality tests demonstrating provider-agnostic fix
  • Regression tests ensuring no existing functionality broken

Impact

The fix automatically applies when ModelSettings(reasoning=...) is used with any LiteLLM model, making it future-proof for new thinking models that support both reasoning and function calling.

- Introduced a new configuration file for permissions in `.claude/settings.local.json`.
- Enhanced `LitellmModel` to properly handle assistant messages with tool calls when reasoning is enabled, addressing issue openai#765.
- Added a new comprehensive test suite for LiteLLM thinking models to ensure the fix works across various models and scenarios.
- Tests include reproducing the original error, verifying successful tool calls, and checking the fix's applicability to different thinking models.
Enhances the fix for issue openai#765 to work universally with all LiteLLM thinking models that support function calling.

Verified working:
- Anthropic Claude Sonnet 4 (partial fix - progress from "found text" to "found tool_use")
- OpenAI o4-mini (complete success - full tool calling with reasoning)

The fix now automatically applies when ModelSettings(reasoning=...) is used with any LiteLLM model, making it future-proof for new thinking models that support both reasoning and function calling.
Remove .claude/settings.local.json from git tracking and add it to .gitignore
to prevent local Claude Code settings from being committed to the repository.
@knowsuchagency knowsuchagency marked this pull request as draft June 4, 2025 09:22
… maintainability

- Cleaned up whitespace and formatting in the test suite for LiteLLM thinking models.
- Ensured consistent use of commas and spacing in function calls and assertions.
- Verified that the fix for issue openai#765 applies universally across all supported models.
- Enhanced documentation within the tests to clarify the purpose and expected outcomes.
@knowsuchagency
Copy link
Author

Update on Fix Status

This PR successfully addresses part of issue #765, but there's a limitation that requires upstream work in LiteLLM.

What this PR fixes ✅

  • Eliminates the "found text" error by removing content from assistant messages with tool calls when reasoning is enabled
  • Works perfectly with OpenAI's o4-mini model - full tool calling with reasoning support confirmed via tests
  • Universal fix that applies to all LiteLLM thinking models (not just Anthropic)

Current limitation with Anthropic 🟡

Testing shows we've made progress with Anthropic models:

  • Before fix: "Expected thinking or redacted_thinking, but found text"
  • After fix: "Expected thinking or redacted_thinking, but found tool_use"

Root cause analysis

Through debug logging, I discovered that LiteLLM isn't preserving thinking blocks from previous assistant messages when reconstructing conversation history for Anthropic's API. When thinking is enabled, Anthropic requires:

  1. Assistant messages must start with a thinking block
  2. Thinking blocks from previous turns should be included in follow-up requests

LiteLLM currently converts our tool_calls array into tool_use content blocks but doesn't maintain the thinking blocks from previous responses.

Upstream work needed

Full Anthropic support would require LiteLLM to:

  1. Preserve thinking blocks from assistant messages in conversation history
  2. Include them when sending follow-up requests with tool results

This PR provides a valuable partial fix that works for some models (OpenAI o4-mini) and improves the situation for others (Anthropic).

@knowsuchagency
Copy link
Author

Related LiteLLM Issue Found

I found a related bug in LiteLLM that confirms the upstream issues with Anthropic thinking models: BerriAI/litellm#11302

This issue shows that LiteLLM has problems handling reasoning/thinking content from Anthropic's API:

  • Thinking blocks are not properly preserved or translated
  • The reasoning content gets lost or improperly formatted when converting between Anthropic's format and OpenAI's format

This aligns with our findings where LiteLLM:

  1. Doesn't preserve thinking blocks from previous assistant messages
  2. Converts tool_calls to tool_use content blocks without maintaining the required thinking block structure

Our fix addresses what we can at the openai-agents-python level, but full Anthropic thinking model support will require these upstream issues in LiteLLM to be resolved.

@knowsuchagency knowsuchagency marked this pull request as ready for review June 4, 2025 16:34
@knowsuchagency
Copy link
Author

Closing This PR

After further investigation, I believe this fix belongs in LiteLLM rather than the openai-agents-python SDK. Here's why:

Root Cause Analysis

The issue is that LiteLLM doesn't properly preserve thinking blocks when reconstructing conversation history for Anthropic's API. This is a LiteLLM formatting issue, not an agents SDK problem.

Why This Fix Doesn't Belong Here

  1. Wrong abstraction level - The agents SDK shouldn't need to know about provider-specific API formatting quirks
  2. Incomplete workaround - Our fix only partially works (single tool call) and doesn't address the real problem
  3. Maintenance burden - We'd need to track LiteLLM versions and remove this code later when they fix it properly
  4. Architectural concern - Modifying valid message content to work around a dependency's bug sets a bad precedent

The Right Solution

The fix should be implemented in LiteLLM to:

  • Preserve thinking blocks from assistant messages in conversation history
  • Format messages correctly according to Anthropic's thinking model requirements
  • Handle the conversion between OpenAI and Anthropic formats properly

Related Issues

Next Steps

I'll update the original issue #765 with this analysis and recommend users:

  1. Use OpenAI o4-mini for thinking + tools (works perfectly)
  2. Disable reasoning when using Anthropic with tools
  3. Wait for the proper upstream fix in LiteLLM

Thanks for the discussion that helped clarify this should be fixed upstream!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant