Skip to content

Commit 86eba9d

Browse files
Chesarskrrishdholakia
authored andcommitted
feat: Add support for reasoning_effort="none" for Gemini models (BerriAI#16548)
Implements support for reasoning_effort="none" parameter for Gemini models, providing significant cost savings (up to 96% cheaper) by disabling thinking budget while maintaining response quality. Changes: - Added "supports_reasoning": true to gemini-2.0-flash-thinking-exp-01-21 in model config - Implemented mapping for reasoning_effort="none" to thinkingConfig {thinkingBudget: 0, includeThoughts: false} - Added unit test to verify the mapping works correctly Performance impact: - Without reasoning_effort: ~313 tokens - With reasoning_effort="none": ~12 tokens (96% cheaper) Closes BerriAI#16420 Co-authored-by: Krish Dholakia <[email protected]>
1 parent a8938cd commit 86eba9d

File tree

4 files changed

+46
-7
lines changed

4 files changed

+46
-7
lines changed

docs/my-website/docs/providers/gemini.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,23 +64,36 @@ response = completion(
6464

6565
LiteLLM translates OpenAI's `reasoning_effort` to Gemini's `thinking` parameter. [Code](https://github.com/BerriAI/litellm/blob/620664921902d7a9bfb29897a7b27c1a7ef4ddfb/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py#L362)
6666

67-
Added an additional non-OpenAI standard "disable" value for non-reasoning Gemini requests.
67+
**Cost Optimization:** Use `reasoning_effort="none"` (OpenAI standard) for significant cost savings - up to 96% cheaper. [Google's docs](https://ai.google.dev/gemini-api/docs/openai)
68+
69+
:::info
70+
Note: Reasoning cannot be turned off on Gemini 2.5 Pro models.
71+
:::
6872

6973
**Mapping**
7074

71-
| reasoning_effort | thinking |
72-
| ---------------- | -------- |
73-
| "disable" | "budget_tokens": 0 |
74-
| "low" | "budget_tokens": 1024 |
75-
| "medium" | "budget_tokens": 2048 |
76-
| "high" | "budget_tokens": 4096 |
75+
| reasoning_effort | thinking | Notes |
76+
| ---------------- | -------- | ----- |
77+
| "none" | "budget_tokens": 0, "includeThoughts": false | 💰 **Recommended for cost optimization** - OpenAI-compatible, always 0 |
78+
| "disable" | "budget_tokens": DEFAULT (0), "includeThoughts": false | LiteLLM-specific, configurable via env var |
79+
| "low" | "budget_tokens": 1024 | |
80+
| "medium" | "budget_tokens": 2048 | |
81+
| "high" | "budget_tokens": 4096 | |
7782

7883
<Tabs>
7984
<TabItem value="sdk" label="SDK">
8085

8186
```python
8287
from litellm import completion
8388

89+
# Cost-optimized: Use reasoning_effort="none" for best pricing
90+
resp = completion(
91+
model="gemini/gemini-2.0-flash-thinking-exp-01-21",
92+
messages=[{"role": "user", "content": "What is the capital of France?"}],
93+
reasoning_effort="none", # Up to 96% cheaper!
94+
)
95+
96+
# Or use other levels: "low", "medium", "high"
8497
resp = completion(
8598
model="gemini/gemini-2.5-flash-preview-04-17",
8699
messages=[{"role": "user", "content": "What is the capital of France?"}],

litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -567,6 +567,11 @@ def _map_reasoning_effort_to_thinking_budget(
567567
"thinkingBudget": DEFAULT_REASONING_EFFORT_DISABLE_THINKING_BUDGET,
568568
"includeThoughts": False,
569569
}
570+
elif reasoning_effort == "none":
571+
return {
572+
"thinkingBudget": 0,
573+
"includeThoughts": False,
574+
}
570575
else:
571576
raise ValueError(f"Invalid reasoning effort: {reasoning_effort}")
572577

model_prices_and_context_window.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9963,6 +9963,7 @@
99639963
"supports_function_calling": false,
99649964
"supports_parallel_function_calling": true,
99659965
"supports_prompt_caching": true,
9966+
"supports_reasoning": true,
99669967
"supports_response_schema": false,
99679968
"supports_system_messages": true,
99689969
"supports_tool_choice": true,
@@ -11568,6 +11569,7 @@
1156811569
"supports_audio_output": true,
1156911570
"supports_function_calling": true,
1157011571
"supports_prompt_caching": true,
11572+
"supports_reasoning": true,
1157111573
"supports_response_schema": true,
1157211574
"supports_system_messages": true,
1157311575
"supports_tool_choice": true,

tests/llm_translation/test_gemini.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1137,6 +1137,25 @@ def test_gemini_embedding():
11371137
assert response is not None
11381138

11391139

1140+
def test_reasoning_effort_none_mapping():
1141+
"""
1142+
Test that reasoning_effort='none' correctly maps to thinkingConfig.
1143+
Related issue: https://github.com/BerriAI/litellm/issues/16420
1144+
"""
1145+
from litellm.llms.vertex_ai.gemini.vertex_and_google_ai_studio_gemini import (
1146+
VertexGeminiConfig,
1147+
)
1148+
1149+
# Test reasoning_effort="none" mapping
1150+
result = VertexGeminiConfig._map_reasoning_effort_to_thinking_budget(
1151+
reasoning_effort="none",
1152+
model="gemini-2.0-flash-thinking-exp-01-21",
1153+
)
1154+
1155+
assert result is not None
1156+
assert result["thinkingBudget"] == 0
1157+
assert result["includeThoughts"] is False
1158+
11401159
def test_gemini_function_args_preserve_unicode():
11411160
"""
11421161
Test for Issue #16533: Gemini function call arguments should preserve non-ASCII characters

0 commit comments

Comments
 (0)