feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider#897
feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider#897gusgus98 wants to merge 4 commits into
Conversation
Gemma 4 (gemma-4-31b-it / gemma-4-26b-a4b-it) on the Gemini API always thinks and cannot disable it (thinkingBudget and thinkingLevel "low" are rejected with HTTP 400), but it accepts thinkingLevel "minimal"/"high". Re-use the existing "Disable thinking" toggle for the two cloud Gemma models: disabled -> "minimal" (~700ms, the default), enabled -> "high". - Add a Gemini-only `thinkingLevels` registry field; the provider only sends generationConfig.thinkingConfig for models that declare it, so the other Gemini models are left untouched. - Mark both Gemma entries supportsThinking so the toggle shows in the UI. - Fix the response parser: Gemma emits its reasoning in a `thought: true` part plus a separate answer part, so filter out thought parts and join the rest (no-op for single-part responses like gemini-2.5-flash-lite).
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds model-registry-driven support for Gemini/Gemma “thinking” configuration and updates response parsing to strip thought-only parts for thinking-capable models.
Changes:
- Introduces
thinkingLevelsin the cloud model registry to map the existing “Disable thinking” toggle to GeminithinkingConfig.thinkingLevel. - Refactors Gemini request construction to use a shared
generationConfigobject and conditionally attachthinkingConfig. - Updates Gemini response parsing to ignore parts flagged as
thought: trueand return only the answer text.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/services/ai/inferenceProviders/gemini.ts | Builds generationConfig with optional thinkingConfig and filters thought parts out of Gemini responses. |
| src/models/modelRegistryData.json | Marks Gemma 4 models as thinking-capable and defines thinkingLevels mappings. |
| src/models/ModelRegistry.ts | Extends CloudModelDefinition with thinkingLevels metadata used by Gemini provider. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
config.temperature || 0.3 rewrites a deliberate temperature: 0 to 0.3. Use ?? so only null/undefined fall back, matching ReasoningService which already uses `config.temperature ?? 0.3` everywhere. maxTokens keeps || on purpose (0 output tokens is not a value worth preserving). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plain "Disable thinking" on/off toggle is confusing for Gemma 4, whose
thinking can't be turned off — only dialed between minimal and high. For
models that declare `thinkingLevels` (the two Gemma 4 cloud models), render
a labeled Minimal/High segmented control instead of the toggle. It maps onto
the same `disableThinking` flag (minimal = true), so the provider logic is
unchanged; other thinking-capable models keep the existing toggle.
Adds reasoning.thinkingLevel.{label,help,minimal,high} to all 10 locales.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
89ef3b0 to
de9cdc1
Compare
…ispr#837 Reconcile feat/gemma-4-thinking-level with upstream OpenWhispr#837 (Gemini 3.5 Flash + Gemini thinking-disable rework), which touched the same Gemini logic. - Extract resolveGeminiThinkingConfig() (src/services/ai/geminiThinking.ts), shared by the native REST cleanup path (gemini.ts) and the AI-SDK agent path (ReasoningService.ts): Gemma 4 maps the "Disable thinking" toggle two-way (minimal/high); supportsThinking-only models (e.g. Gemini 3.5 Flash) drop to minimal when disabled, matching OpenWhispr#837. Always sets includeThoughts:false. - gemini.ts: keep the `??` temperature fix and the multi-part thought-filtering parser; adopt OpenWhispr#837's GeminiGenerationConfig interface. - ReasoningService.ts: honor the thinking level in agent/tools mode too, closing the gap where the Minimal/High selector only affected the cleanup path. - Add unit tests for the mapping (test/services/geminiThinking.test.js). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged What the merge does
Validation: Still pending: a live smoke test of |
|
✅ Smoke-tested
|
Summary
Follow-up to #892, which added the two Gemma 4 cloud models (
gemma-4-31b-it,gemma-4-26b-a4b-it) to the Gemini picker but left two Gemma-specific gaps:thinkingBudgetandthinkingLevel: "low"are rejected with HTTP 400), but it does acceptthinkingLevel: "minimal"(~700ms) and"high".thought: trueplus a separate answer part. The provider readparts[0], so dictation cleanup returned the model's chain-of-thought instead of the cleaned text.This PR adds a thinking-level control for Gemma, fixes the response parsing, and surfaces it in the UI as a clear Minimal / High selector.
UI
Because Gemma's thinking can only be dialed, not disabled, the plain on/off "Disable thinking" toggle was confusing. For models that declare
thinkingLevels(the two Gemma 4 entries), the settings panel now shows a labeled segmented control instead:thinkingLevelsentminimalhighIt maps onto the same
disableThinkingflag the toggle used (minimal =true), so the provider logic is unchanged. Every other thinking-capable model keeps the existing toggle. Default is Minimal, so Gemma is fast out of the box.Changes
src/models/ModelRegistry.ts— add a Gemini-onlythinkingLevels?: { disabled, enabled }field toCloudModelDefinition.src/models/modelRegistryData.json— both Gemma entries getsupportsThinking: trueandthinkingLevels: { "disabled": "minimal", "enabled": "high" }.src/services/ai/inferenceProviders/gemini.tsgenerationConfig.thinkingConfigonly for models that declarethinkingLevels— the other Gemini models are untouched.thoughtparts in the response and join the rest (no-op for single-part responses likegemini-2.5-flash-lite).config.temperature ?? 0.3(was||) so an explicittemperature: 0isn't overridden, matchingReasoningService.src/components/ui/ThinkingLevelSelector.tsx(new) — compact Minimal/High segmented control, styled to matchActivationModeSelector.src/components/settings/InferenceConfigEditor.tsx— render the selector forthinkingLevelsmodels, the toggle otherwise.src/locales/*/translation.json—reasoning.thinkingLevel.{label,help,minimal,high}added to all 10 locales.Scoped so nothing changes for
gemini-3.1-pro-preview,gemini-3-flash-preview, orgemini-2.5-flash-lite.tsc --noEmit, eslint, and prettier all pass.Testing
Verified live in the running app: Settings → AI Models → Language Models → Providers → Gemini → Gemma 4, the Minimal/High selector appears (defaulting to Minimal) and switches the level. Verified against the Gemini API that both Gemma models accept
minimal/highand return clean output oncethoughtparts are filtered.