feat(stt): add experimental Web Speech API mode alongside Whisper by iamfisho · Pull Request #7 · deivid11/tide-commander

iamfisho · 2026-04-27T03:21:04Z

Summary

Adds a second experimental speech-to-text option that runs entirely in the browser via the Web Speech API, complementing the existing Whisper server-side mode. The two modes are mutually exclusive — toggling one on automatically turns the other off.

Why

Whisper transcription has noticeable latency because audio is uploaded and run server-side. The Web Speech API delegates recognition to the browser's built-in service, which is meaningfully faster on Chromium-based browsers and Safari, at the cost of needing a supported browser and outbound connectivity to the browser's recognition service.

What's in the PR

experimentalWebSpeechSTT new boolean setting (default false) added to Settings and DEFAULT_SETTINGS in src/packages/client/store/types.ts.
useWebSpeechSTT hook (src/packages/client/hooks/useWebSpeechSTT.ts) — same interface as useSTT, with continuous = true so natural pauses don't end the session early.
TerminalInputArea — branches between the two STT hooks based on which experimental flag is on; surfaces hook errors via toast (useToast) so users can see failures without opening DevTools (the existing Whisper hook also benefits).
ConfigSection — second toggle under Experimental with mutex (turning one on turns the other off). Toggle is auto-disabled when the browser exposes neither SpeechRecognition nor webkitSpeechRecognition (Firefox), with an explanatory tooltip. Adds optional disabled prop to the local Toggle component plus a .config-toggle-disabled style.
i18n — new keys webSpeechSTT, ttsHint, webSpeechSTTHint, webSpeechSTTUnsupported (config namespace) and voiceInputErrorTitle (terminal namespace) added to all 10 locales (de, en, es, fr, hi, it, ja, pt, ru, zh-CN).

Notes / known limitations

Web Speech API in Chromium browsers depends on a cloud recognition service (Google's). Networks that block it produce a network error — now visible via toast.
Default recognition language is es-ES; making it user-configurable is left for a follow-up.

Test plan

Toggle "Web Speech STT (Browser)" turns on and automatically switches off "Text to Speech" (Whisper).
Toggling the Whisper one back on switches the Web Speech one off.
In Firefox, the Web Speech toggle is disabled with an explanatory tooltip.
In Chrome/Edge over HTTPS, click mic → speak → click again → message is sent.
Error path: with the toggle on, simulate offline / blocked network → toast appears with the error message.
Whisper mode still works as before (no regressions).

Follow-up suggestion (separate iteration): rather than auto-sending the transcription as a command, the STT result should populate the input field, leaving the final send action to the user.

The current behavior (handleTranscription in TerminalInputArea calls store.sendCommand directly) optimizes for speed but bypasses any chance for the user to correct misrecognized words. STT is inherently lossy — accents, homophones, background noise, and (for Web Speech) cloud-side guesswork can all produce subtly wrong transcripts that would never have left the user's keyboard. Sending those straight to an agent burns turns on noise and erodes trust in the feature.

Proposed change: have the transcription fill the existing input (so the user can review, edit, and submit normally with Enter / send button). As a future enhancement, we could add an opt-in voice-confirmation step ("send" / "cancel" spoken commands) for fully hands-free operation without sacrificing accuracy.

This keeps the prompt quality guarantee in the user's hands, which matters more than shaving a click — especially on agents where every command costs tokens and may trigger irreversible actions.

Introduces a second experimental speech-to-text option that runs entirely in the browser via the Web Speech API, complementing the existing Whisper server-side mode. The two modes are mutually exclusive — toggling one on turns the other off. - New `experimentalWebSpeechSTT` setting (default false). - New `useWebSpeechSTT` hook with the same interface as `useSTT` and a `continuous = true` recognizer so natural pauses don't end the session. - `TerminalInputArea` branches between the two hooks based on which flag is active and surfaces hook errors via toast for visibility. - Config toggle is auto-disabled when the browser exposes neither `SpeechRecognition` nor `webkitSpeechRecognition`. - i18n: `webSpeechSTT`, `ttsHint`, `webSpeechSTTHint`, `webSpeechSTTUnsupported` in config, `voiceInputErrorTitle` in terminal, across all 10 locales. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): add experimental Web Speech API mode alongside Whisper#7

feat(stt): add experimental Web Speech API mode alongside Whisper#7
iamfisho wants to merge 1 commit into
deivid11:masterfrom
iamfisho:feat/experimental-web-speech-stt

iamfisho commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamfisho commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What's in the PR

Notes / known limitations

Test plan

Follow-up suggestion (separate iteration): rather than auto-sending the transcription as a command, the STT result should populate the input field, leaving the final send action to the user.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iamfisho commented Apr 27, 2026 •

edited

Loading