Skip to content

feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider#897

Open
gusgus98 wants to merge 4 commits into
OpenWhispr:mainfrom
gusobenitez:feat/gemma-4-thinking-level
Open

feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider#897
gusgus98 wants to merge 4 commits into
OpenWhispr:mainfrom
gusobenitez:feat/gemma-4-thinking-level

Conversation

@gusgus98
Copy link
Copy Markdown

@gusgus98 gusgus98 commented Jun 3, 2026

Summary

Follow-up to #892, which added the two Gemma 4 cloud models (gemma-4-31b-it, gemma-4-26b-a4b-it) to the Gemini picker but left two Gemma-specific gaps:

  1. No thinking control / slow by default. Gemma 4 on the Gemini API always thinks and runs at full effort (~5s for dictation cleanup). It can't be turned off (thinkingBudget and thinkingLevel: "low" are rejected with HTTP 400), but it does accept thinkingLevel: "minimal" (~700ms) and "high".
  2. Wrong text returned. Gemma emits its reasoning in a response part flagged thought: true plus a separate answer part. The provider read parts[0], so dictation cleanup returned the model's chain-of-thought instead of the cleaned text.

This PR adds a thinking-level control for Gemma, fixes the response parsing, and surfaces it in the UI as a clear Minimal / High selector.

UI

Because Gemma's thinking can only be dialed, not disabled, the plain on/off "Disable thinking" toggle was confusing. For models that declare thinkingLevels (the two Gemma 4 entries), the settings panel now shows a labeled segmented control instead:

Thinking level                          ┌────────────┬─────────┐
Minimal is fastest. High gives fuller   │ ⚡ Minimal  │ 🧠 High │
reasoning but is slower.                └────────────┴─────────┘
Selection thinkingLevel sent Result
Minimal (default) minimal fast (~700ms)
High high full reasoning

It maps onto the same disableThinking flag the toggle used (minimal = true), so the provider logic is unchanged. Every other thinking-capable model keeps the existing toggle. Default is Minimal, so Gemma is fast out of the box.

Changes

  • src/models/ModelRegistry.ts — add a Gemini-only thinkingLevels?: { disabled, enabled } field to CloudModelDefinition.
  • src/models/modelRegistryData.json — both Gemma entries get supportsThinking: true and thinkingLevels: { "disabled": "minimal", "enabled": "high" }.
  • src/services/ai/inferenceProviders/gemini.ts
    • Send generationConfig.thinkingConfig only for models that declare thinkingLevels — the other Gemini models are untouched.
    • Map the selector to the configured level.
    • Filter out thought parts in the response and join the rest (no-op for single-part responses like gemini-2.5-flash-lite).
    • Use config.temperature ?? 0.3 (was ||) so an explicit temperature: 0 isn't overridden, matching ReasoningService.
  • src/components/ui/ThinkingLevelSelector.tsx (new) — compact Minimal/High segmented control, styled to match ActivationModeSelector.
  • src/components/settings/InferenceConfigEditor.tsx — render the selector for thinkingLevels models, the toggle otherwise.
  • src/locales/*/translation.jsonreasoning.thinkingLevel.{label,help,minimal,high} added to all 10 locales.

Scoped so nothing changes for gemini-3.1-pro-preview, gemini-3-flash-preview, or gemini-2.5-flash-lite. tsc --noEmit, eslint, and prettier all pass.

Testing

Verified live in the running app: Settings → AI Models → Language Models → Providers → Gemini → Gemma 4, the Minimal/High selector appears (defaulting to Minimal) and switches the level. Verified against the Gemini API that both Gemma models accept minimal/high and return clean output once thought parts are filtered.

Note on the high path: for Gemini 2.5+, thinking tokens count against maxOutputTokens (floored at ~2000 here). On very short input at high, Gemma could spend the budget thinking and hit MAX_TOKENS. If that proves to be an issue in practice, the fix is to give maxOutputTokens extra headroom when high is selected.

Gemma 4 (gemma-4-31b-it / gemma-4-26b-a4b-it) on the Gemini API always
thinks and cannot disable it (thinkingBudget and thinkingLevel "low" are
rejected with HTTP 400), but it accepts thinkingLevel "minimal"/"high".
Re-use the existing "Disable thinking" toggle for the two cloud Gemma
models: disabled -> "minimal" (~700ms, the default), enabled -> "high".

- Add a Gemini-only `thinkingLevels` registry field; the provider only
  sends generationConfig.thinkingConfig for models that declare it, so the
  other Gemini models are left untouched.
- Mark both Gemma entries supportsThinking so the toggle shows in the UI.
- Fix the response parser: Gemma emits its reasoning in a `thought: true`
  part plus a separate answer part, so filter out thought parts and join
  the rest (no-op for single-part responses like gemini-2.5-flash-lite).
Copilot AI review requested due to automatic review settings June 3, 2026 15:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds model-registry-driven support for Gemini/Gemma “thinking” configuration and updates response parsing to strip thought-only parts for thinking-capable models.

Changes:

  • Introduces thinkingLevels in the cloud model registry to map the existing “Disable thinking” toggle to Gemini thinkingConfig.thinkingLevel.
  • Refactors Gemini request construction to use a shared generationConfig object and conditionally attach thinkingConfig.
  • Updates Gemini response parsing to ignore parts flagged as thought: true and return only the answer text.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/services/ai/inferenceProviders/gemini.ts Builds generationConfig with optional thinkingConfig and filters thought parts out of Gemini responses.
src/models/modelRegistryData.json Marks Gemma 4 models as thinking-capable and defines thinkingLevels mappings.
src/models/ModelRegistry.ts Extends CloudModelDefinition with thinkingLevels metadata used by Gemini provider.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/services/ai/inferenceProviders/gemini.ts Outdated
Comment thread src/models/ModelRegistry.ts
gusgus98 and others added 2 commits June 3, 2026 13:00
config.temperature || 0.3 rewrites a deliberate temperature: 0 to 0.3.
Use ?? so only null/undefined fall back, matching ReasoningService which
already uses `config.temperature ?? 0.3` everywhere. maxTokens keeps || on
purpose (0 output tokens is not a value worth preserving).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plain "Disable thinking" on/off toggle is confusing for Gemma 4, whose
thinking can't be turned off — only dialed between minimal and high. For
models that declare `thinkingLevels` (the two Gemma 4 cloud models), render
a labeled Minimal/High segmented control instead of the toggle. It maps onto
the same `disableThinking` flag (minimal = true), so the provider logic is
unchanged; other thinking-capable models keep the existing toggle.

Adds reasoning.thinkingLevel.{label,help,minimal,high} to all 10 locales.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gusgus98 gusgus98 changed the title feat(gemini): control Gemma 4 thinking level via the Disable thinking toggle feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider Jun 3, 2026
@gusgus98 gusgus98 force-pushed the feat/gemma-4-thinking-level branch from 89ef3b0 to de9cdc1 Compare June 3, 2026 22:31
…ispr#837

Reconcile feat/gemma-4-thinking-level with upstream OpenWhispr#837 (Gemini 3.5 Flash +
Gemini thinking-disable rework), which touched the same Gemini logic.

- Extract resolveGeminiThinkingConfig() (src/services/ai/geminiThinking.ts),
  shared by the native REST cleanup path (gemini.ts) and the AI-SDK agent path
  (ReasoningService.ts): Gemma 4 maps the "Disable thinking" toggle two-way
  (minimal/high); supportsThinking-only models (e.g. Gemini 3.5 Flash) drop to
  minimal when disabled, matching OpenWhispr#837. Always sets includeThoughts:false.
- gemini.ts: keep the `??` temperature fix and the multi-part thought-filtering
  parser; adopt OpenWhispr#837's GeminiGenerationConfig interface.
- ReasoningService.ts: honor the thinking level in agent/tools mode too, closing
  the gap where the Minimal/High selector only affected the cleanup path.
- Add unit tests for the mapping (test/services/geminiThinking.test.js).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gusgus98
Copy link
Copy Markdown
Author

gusgus98 commented Jun 5, 2026

Merged main to resolve a conflict with #837, which reworked Gemini thinking-disable handling in the same files (gemini.ts, ReasoningService.ts). Rather than pick a side, I unified both approaches:

What the merge does

  • Extracted a shared resolveGeminiThinkingConfig() helper (src/services/ai/geminiThinking.ts) used by both the native REST cleanup path (gemini.ts) and the AI-SDK agent path (ReasoningService.ts), so the mapping is identical in both:
  • gemini.ts: kept this PR's ?? temperature fix and the multi-part thought-filtering parser; adopted Add Gemini 3.5 Flash to Google Gemini models #837's GeminiGenerationConfig interface + includeThoughts: false.
  • ReasoningService.ts: the Minimal/High selector now also applies in agent/tools mode — previously only the cleanup path honored it.
  • Added unit tests for the mapping (test/services/geminiThinking.test.js).

Validation: typecheck, lint, i18n:check, build:renderer, and the new unit tests all pass.

Still pending: a live smoke test of thinkingLevel: "high" against the Gemini API — #837 only ever exercised "minimal", so the enabled→high path has no automated coverage yet.

@gusgus98
Copy link
Copy Markdown
Author

gusgus98 commented Jun 5, 2026

Smoke-tested thinkingLevel: "high" against gemma-4-31b-it (live Gemini API) — closing the pending item above.

  • HTTP 200, finishReason: STOP — the enabled→high mapping is accepted.
  • The response splits into a thought: true reasoning part + a separate answer part, and includeThoughts: false does not suppress the thought part for Gemma 4 (the reasoning is still returned and thoughtsTokenCount billed). The multi-part thought-filtering parser in this PR strips it and returns only the answer.
  • This makes the parser required, not just defensive: the prior parts[0].text access would have returned the reasoning at "high", and at "minimal" part[0] is an empty thought:true part — so the old parser would have thrown "empty response." The parser added here handles both.

"minimal" (the default, since disableThinking defaults to true) returns a tight single-line cleanup as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants