Fix LiteLLM thinking models with tool calling across providers #812

knowsuchagency · 2025-06-04T05:18:08Z

Summary

Fixes issue #765 by implementing a universal solution for LiteLLM thinking models that support function calling.

Changes

Enhanced LitellmModel._fix_thinking_model_messages() to work with any thinking model when reasoning is enabled
Added comprehensive test suite covering multiple thinking model providers
Verified fix works across different providers (Anthropic and OpenAI)

Verified Working

✅ Anthropic Claude Sonnet 4: Shows progress from "found text" to "found tool_use" error
✅ OpenAI o4-mini: Complete success with full tool calling and reasoning support

Test Coverage

Mock tests for error reproduction and successful scenarios
Real API tests with multiple thinking model providers
Generality tests demonstrating provider-agnostic fix
Regression tests ensuring no existing functionality broken

Impact

The fix automatically applies when ModelSettings(reasoning=...) is used with any LiteLLM model, making it future-proof for new thinking models that support both reasoning and function calling.

- Introduced a new configuration file for permissions in `.claude/settings.local.json`. - Enhanced `LitellmModel` to properly handle assistant messages with tool calls when reasoning is enabled, addressing issue openai#765. - Added a new comprehensive test suite for LiteLLM thinking models to ensure the fix works across various models and scenarios. - Tests include reproducing the original error, verifying successful tool calls, and checking the fix's applicability to different thinking models.

Enhances the fix for issue openai#765 to work universally with all LiteLLM thinking models that support function calling. Verified working: - Anthropic Claude Sonnet 4 (partial fix - progress from "found text" to "found tool_use") - OpenAI o4-mini (complete success - full tool calling with reasoning) The fix now automatically applies when ModelSettings(reasoning=...) is used with any LiteLLM model, making it future-proof for new thinking models that support both reasoning and function calling.

Remove .claude/settings.local.json from git tracking and add it to .gitignore to prevent local Claude Code settings from being committed to the repository.

… maintainability - Cleaned up whitespace and formatting in the test suite for LiteLLM thinking models. - Ensured consistent use of commas and spacing in function calls and assertions. - Verified that the fix for issue openai#765 applies universally across all supported models. - Enhanced documentation within the tests to clarify the purpose and expected outcomes.

knowsuchagency · 2025-06-04T16:24:52Z

Update on Fix Status

This PR successfully addresses part of issue #765, but there's a limitation that requires upstream work in LiteLLM.

What this PR fixes ✅

Eliminates the "found text" error by removing content from assistant messages with tool calls when reasoning is enabled
Works perfectly with OpenAI's o4-mini model - full tool calling with reasoning support confirmed via tests
Universal fix that applies to all LiteLLM thinking models (not just Anthropic)

Current limitation with Anthropic 🟡

Testing shows we've made progress with Anthropic models:

Before fix: "Expected thinking or redacted_thinking, but found text"
After fix: "Expected thinking or redacted_thinking, but found tool_use"

Root cause analysis

Through debug logging, I discovered that LiteLLM isn't preserving thinking blocks from previous assistant messages when reconstructing conversation history for Anthropic's API. When thinking is enabled, Anthropic requires:

Assistant messages must start with a thinking block
Thinking blocks from previous turns should be included in follow-up requests

LiteLLM currently converts our tool_calls array into tool_use content blocks but doesn't maintain the thinking blocks from previous responses.

Upstream work needed

Full Anthropic support would require LiteLLM to:

Preserve thinking blocks from assistant messages in conversation history
Include them when sending follow-up requests with tool results

This PR provides a valuable partial fix that works for some models (OpenAI o4-mini) and improves the situation for others (Anthropic).

knowsuchagency · 2025-06-04T16:28:11Z

Related LiteLLM Issue Found

I found a related bug in LiteLLM that confirms the upstream issues with Anthropic thinking models: BerriAI/litellm#11302

This issue shows that LiteLLM has problems handling reasoning/thinking content from Anthropic's API:

Thinking blocks are not properly preserved or translated
The reasoning content gets lost or improperly formatted when converting between Anthropic's format and OpenAI's format

This aligns with our findings where LiteLLM:

Doesn't preserve thinking blocks from previous assistant messages
Converts tool_calls to tool_use content blocks without maintaining the required thinking block structure

Our fix addresses what we can at the openai-agents-python level, but full Anthropic thinking model support will require these upstream issues in LiteLLM to be resolved.

knowsuchagency · 2025-06-04T16:46:55Z

Closing This PR

After further investigation, I believe this fix belongs in LiteLLM rather than the openai-agents-python SDK. Here's why:

Root Cause Analysis

The issue is that LiteLLM doesn't properly preserve thinking blocks when reconstructing conversation history for Anthropic's API. This is a LiteLLM formatting issue, not an agents SDK problem.

Why This Fix Doesn't Belong Here

Wrong abstraction level - The agents SDK shouldn't need to know about provider-specific API formatting quirks
Incomplete workaround - Our fix only partially works (single tool call) and doesn't address the real problem
Maintenance burden - We'd need to track LiteLLM versions and remove this code later when they fix it properly
Architectural concern - Modifying valid message content to work around a dependency's bug sets a bad precedent

The Right Solution

The fix should be implemented in LiteLLM to:

Preserve thinking blocks from assistant messages in conversation history
Format messages correctly according to Anthropic's thinking model requirements
Handle the conversion between OpenAI and Anthropic formats properly

Related Issues

LiteLLM Issue: [Bug]: reasoning content is not returned from anthropic when using aresponses BerriAI/litellm#11302
Detailed analysis: See my comment on that issue with implementation guidance

Next Steps

I'll update the original issue #765 with this analysis and recommend users:

Use OpenAI o4-mini for thinking + tools (works perfectly)
Disable reasoning when using Anthropic with tools
Wait for the proper upstream fix in LiteLLM

Thanks for the discussion that helped clarify this should be fixed upstream!

knowsuchagency added 3 commits June 3, 2025 21:58

Untrack .claude/settings.local.json and add to .gitignore

a4683ac

Remove .claude/settings.local.json from git tracking and add it to .gitignore to prevent local Claude Code settings from being committed to the repository.

knowsuchagency marked this pull request as draft June 4, 2025 09:22

knowsuchagency marked this pull request as ready for review June 4, 2025 16:34

knowsuchagency closed this Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix LiteLLM thinking models with tool calling across providers #812

Fix LiteLLM thinking models with tool calling across providers #812

Uh oh!

knowsuchagency commented Jun 4, 2025

Uh oh!

knowsuchagency commented Jun 4, 2025

Uh oh!

knowsuchagency commented Jun 4, 2025

Uh oh!

knowsuchagency commented Jun 4, 2025

Uh oh!

Uh oh!

Fix LiteLLM thinking models with tool calling across providers #812

Fix LiteLLM thinking models with tool calling across providers #812

Uh oh!

Conversation

knowsuchagency commented Jun 4, 2025

Summary

Changes

Verified Working

Test Coverage

Impact

Uh oh!

knowsuchagency commented Jun 4, 2025

Update on Fix Status

What this PR fixes ✅

Current limitation with Anthropic 🟡

Root cause analysis

Upstream work needed

Uh oh!

knowsuchagency commented Jun 4, 2025

Related LiteLLM Issue Found

Uh oh!

knowsuchagency commented Jun 4, 2025

Closing This PR

Root Cause Analysis

Why This Fix Doesn't Belong Here

The Right Solution

Related Issues

Next Steps

Uh oh!

Uh oh!