feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider by gusgus98 · Pull Request #897 · OpenWhispr/openwhispr

gusgus98 · 2026-06-03T15:59:33Z

Summary

Follow-up to #892, which added the two Gemma 4 cloud models (gemma-4-31b-it, gemma-4-26b-a4b-it) to the Gemini picker but left two Gemma-specific gaps:

No thinking control / slow by default. Gemma 4 on the Gemini API always thinks and runs at full effort (~5s for dictation cleanup). It can't be turned off (thinkingBudget and thinkingLevel: "low" are rejected with HTTP 400), but it does accept thinkingLevel: "minimal" (~700ms) and "high".
Wrong text returned. Gemma emits its reasoning in a response part flagged thought: true plus a separate answer part. The provider read parts[0], so dictation cleanup returned the model's chain-of-thought instead of the cleaned text.

This PR adds a thinking-level control for Gemma, fixes the response parsing, and surfaces it in the UI as a clear Minimal / High selector.

UI

Because Gemma's thinking can only be dialed, not disabled, the plain on/off "Disable thinking" toggle was confusing. For models that declare thinkingLevels (the two Gemma 4 entries), the settings panel now shows a labeled segmented control instead:

Thinking level                          ┌────────────┬─────────┐
Minimal is fastest. High gives fuller   │ ⚡ Minimal  │ 🧠 High │
reasoning but is slower.                └────────────┴─────────┘

Selection	`thinkingLevel` sent	Result
Minimal (default)	`minimal`	fast (~700ms)
High	`high`	full reasoning

It maps onto the same disableThinking flag the toggle used (minimal = true), so the provider logic is unchanged. Every other thinking-capable model keeps the existing toggle. Default is Minimal, so Gemma is fast out of the box.

Changes

src/models/ModelRegistry.ts — add a Gemini-only thinkingLevels?: { disabled, enabled } field to CloudModelDefinition.
src/models/modelRegistryData.json — both Gemma entries get supportsThinking: true and thinkingLevels: { "disabled": "minimal", "enabled": "high" }.
src/services/ai/inferenceProviders/gemini.ts
- Send generationConfig.thinkingConfig only for models that declare thinkingLevels — the other Gemini models are untouched.
- Map the selector to the configured level.
- Filter out thought parts in the response and join the rest (no-op for single-part responses like gemini-2.5-flash-lite).
- Use config.temperature ?? 0.3 (was ||) so an explicit temperature: 0 isn't overridden, matching ReasoningService.
src/components/ui/ThinkingLevelSelector.tsx (new) — compact Minimal/High segmented control, styled to match ActivationModeSelector.
src/components/settings/InferenceConfigEditor.tsx — render the selector for thinkingLevels models, the toggle otherwise.
src/locales/*/translation.json — reasoning.thinkingLevel.{label,help,minimal,high} added to all 10 locales.

Scoped so nothing changes for gemini-3.1-pro-preview, gemini-3-flash-preview, or gemini-2.5-flash-lite. tsc --noEmit, eslint, and prettier all pass.

Testing

Verified live in the running app: Settings → AI Models → Language Models → Providers → Gemini → Gemma 4, the Minimal/High selector appears (defaulting to Minimal) and switches the level. Verified against the Gemini API that both Gemma models accept minimal/high and return clean output once thought parts are filtered.

Note on the high path: for Gemini 2.5+, thinking tokens count against maxOutputTokens (floored at ~2000 here). On very short input at high, Gemma could spend the budget thinking and hit MAX_TOKENS. If that proves to be an issue in practice, the fix is to give maxOutputTokens extra headroom when high is selected.

Gemma 4 (gemma-4-31b-it / gemma-4-26b-a4b-it) on the Gemini API always thinks and cannot disable it (thinkingBudget and thinkingLevel "low" are rejected with HTTP 400), but it accepts thinkingLevel "minimal"/"high". Re-use the existing "Disable thinking" toggle for the two cloud Gemma models: disabled -> "minimal" (~700ms, the default), enabled -> "high". - Add a Gemini-only `thinkingLevels` registry field; the provider only sends generationConfig.thinkingConfig for models that declare it, so the other Gemini models are left untouched. - Mark both Gemma entries supportsThinking so the toggle shows in the UI. - Fix the response parser: Gemma emits its reasoning in a `thought: true` part plus a separate answer part, so filter out thought parts and join the rest (no-op for single-part responses like gemini-2.5-flash-lite).

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds model-registry-driven support for Gemini/Gemma “thinking” configuration and updates response parsing to strip thought-only parts for thinking-capable models.

Changes:

Introduces thinkingLevels in the cloud model registry to map the existing “Disable thinking” toggle to Gemini thinkingConfig.thinkingLevel.
Refactors Gemini request construction to use a shared generationConfig object and conditionally attach thinkingConfig.
Updates Gemini response parsing to ignore parts flagged as thought: true and return only the answer text.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/services/ai/inferenceProviders/gemini.ts	Builds `generationConfig` with optional `thinkingConfig` and filters `thought` parts out of Gemini responses.
src/models/modelRegistryData.json	Marks Gemma 4 models as thinking-capable and defines `thinkingLevels` mappings.
src/models/ModelRegistry.ts	Extends `CloudModelDefinition` with `thinkingLevels` metadata used by Gemini provider.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

config.temperature || 0.3 rewrites a deliberate temperature: 0 to 0.3. Use ?? so only null/undefined fall back, matching ReasoningService which already uses `config.temperature ?? 0.3` everywhere. maxTokens keeps || on purpose (0 output tokens is not a value worth preserving). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The plain "Disable thinking" on/off toggle is confusing for Gemma 4, whose thinking can't be turned off — only dialed between minimal and high. For models that declare `thinkingLevels` (the two Gemma 4 cloud models), render a labeled Minimal/High segmented control instead of the toggle. It maps onto the same `disableThinking` flag (minimal = true), so the provider logic is unchanged; other thinking-capable models keep the existing toggle. Adds reasoning.thinkingLevel.{label,help,minimal,high} to all 10 locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ispr#837 Reconcile feat/gemma-4-thinking-level with upstream OpenWhispr#837 (Gemini 3.5 Flash + Gemini thinking-disable rework), which touched the same Gemini logic. - Extract resolveGeminiThinkingConfig() (src/services/ai/geminiThinking.ts), shared by the native REST cleanup path (gemini.ts) and the AI-SDK agent path (ReasoningService.ts): Gemma 4 maps the "Disable thinking" toggle two-way (minimal/high); supportsThinking-only models (e.g. Gemini 3.5 Flash) drop to minimal when disabled, matching OpenWhispr#837. Always sets includeThoughts:false. - gemini.ts: keep the `??` temperature fix and the multi-part thought-filtering parser; adopt OpenWhispr#837's GeminiGenerationConfig interface. - ReasoningService.ts: honor the thinking level in agent/tools mode too, closing the gap where the Minimal/High selector only affected the cleanup path. - Add unit tests for the mapping (test/services/geminiThinking.test.js). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gusgus98 · 2026-06-05T02:19:21Z

Merged main to resolve a conflict with #837, which reworked Gemini thinking-disable handling in the same files (gemini.ts, ReasoningService.ts). Rather than pick a side, I unified both approaches:

What the merge does

Extracted a shared resolveGeminiThinkingConfig() helper (src/services/ai/geminiThinking.ts) used by both the native REST cleanup path (gemini.ts) and the AI-SDK agent path (ReasoningService.ts), so the mapping is identical in both:
- Gemma 4 (thinkingLevels): two-way — disabled → minimal, enabled → high
- supportsThinking-only models (e.g. Gemini 3.5 Flash from Add Gemini 3.5 Flash to Google Gemini models #837): minimal when disabled, API default otherwise — unchanged from Add Gemini 3.5 Flash to Google Gemini models #837
gemini.ts: kept this PR's ?? temperature fix and the multi-part thought-filtering parser; adopted Add Gemini 3.5 Flash to Google Gemini models #837's GeminiGenerationConfig interface + includeThoughts: false.
ReasoningService.ts: the Minimal/High selector now also applies in agent/tools mode — previously only the cleanup path honored it.
Added unit tests for the mapping (test/services/geminiThinking.test.js).

Validation: typecheck, lint, i18n:check, build:renderer, and the new unit tests all pass.

Still pending: a live smoke test of thinkingLevel: "high" against the Gemini API — #837 only ever exercised "minimal", so the enabled→high path has no automated coverage yet.

gusgus98 · 2026-06-05T02:24:00Z

✅ Smoke-tested thinkingLevel: "high" against gemma-4-31b-it (live Gemini API) — closing the pending item above.

HTTP 200, finishReason: STOP — the enabled→high mapping is accepted.
The response splits into a thought: true reasoning part + a separate answer part, and includeThoughts: false does not suppress the thought part for Gemma 4 (the reasoning is still returned and thoughtsTokenCount billed). The multi-part thought-filtering parser in this PR strips it and returns only the answer.
This makes the parser required, not just defensive: the prior parts[0].text access would have returned the reasoning at "high", and at "minimal" part[0] is an empty thought:true part — so the old parser would have thrown "empty response." The parser added here handles both.

"minimal" (the default, since disableThinking defaults to true) returns a tight single-line cleanup as expected.

Copilot AI review requested due to automatic review settings June 3, 2026 15:59

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread src/services/ai/inferenceProviders/gemini.ts Outdated

Comment thread src/models/ModelRegistry.ts

gusgus98 and others added 2 commits June 3, 2026 13:00

gusgus98 changed the title ~~feat(gemini): control Gemma 4 thinking level via the Disable thinking toggle~~ feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider Jun 3, 2026

gusgus98 force-pushed the feat/gemma-4-thinking-level branch from 89ef3b0 to de9cdc1 Compare June 3, 2026 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider#897

feat: Gemma 4 thinking-level control (Minimal/High) for the Gemini provider#897
gusgus98 wants to merge 4 commits into
OpenWhispr:mainfrom
gusobenitez:feat/gemma-4-thinking-level

gusgus98 commented Jun 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

gusgus98 commented Jun 5, 2026

Uh oh!

gusgus98 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gusgus98 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UI

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

gusgus98 commented Jun 5, 2026

Uh oh!

gusgus98 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gusgus98 commented Jun 3, 2026 •

edited

Loading