Improved Dictate Keyboard (Whisper AI transcription)

Dictate is an easy-to-use keyboard for transcribing and dictating. This is a rebranded and enhanced version of the original FOSS Dictate app, featuring numerous additions, improvements, and bug fixes. The app uses OpenAI Whisper in the background, which supports extremely accurate results for many different languages with punctuation and custom AI rewording using GPT-4 Omni.

This fork is based on the original Dictate repository. This fork includes a lot of major improvements, new features, and bug fixes. The following sections provide an overview of the key changes and additions made in this enhanced version.

The APK can be downloaded as Debug Build (unsigned) from here.

Key Improvements, Added Features and Bugfixes over original Dictate

Transcription via SendTo / Share: Added Keyboard into the "Share / SendTo" menu to transcribe audio files from any app (e.g., WhatsApp voice messages) using share menu
Better API Prompt control: The original APP doesn't make use of System Prompts AND adds a custom prompt to each request, which can lead to unexpected results. See the original code for the constant PROMPT_REWORDING_BE_PRECISE. This can get confusing for the API because it could be written in a different language than your prompt. This version allows to set your defined Prompts as System Prompt and the input text (i.e. your transcription) is sent as user message. This way, the API has a much better understanding of what you want to achieve. Removed the used of PROMPT_REWORDING_BE_PRECISE, because all instructions shall be part of your own definitions.
- Ability to control API temperature: Added option to control the temperature of the API calls (0.0 - 1.0) to influence creativity of the output
Automation of Rewording-Prompts: Added option to automatically use one of the defined custom prompts after each transcription automatically.
Quickselection of Temporary Rewording Prompt: Added option to temporarily toggle one of the user-defined prompts for immediate use for the current Keyboard session (until Keyboard is closed)
Live/Instant Prompting based on Textselection: Added functionality to use the "Live Prompt" (Instant Prompt) based on the current text in the input field. If you click the LivePrompt Button as normal, it will start a recording and use the transcript as input for the prompt. If you make a text-selection and click the button, it will use the selected text in the input field as input for the prompt without starting a recording.
Added support for Bluetooth Headsets : Added support for Bluetooth headsets (e.g., AirPods) as idea (Thanks to @cuylerstuwe) with proper handling of SCO and asynchronous connection handling, indicated by a Bluetooth icon in the UI
Improved Workflow: Stop Recording & switch back: Added a button to stop recording and return to previous Keyboard (e.g., Gboard) without needing to switch manually
Enhanced Prompt-Buttons UI: Prompt buttons are always visible and intelligently handle text selection - using either existing selection or automatically selecting all text
- Pressing prompt buttons during Active Recording: Pressing prompt buttons during an active recording just toggles this prompt to use after Recording Stop instead of immediately applying it to existing text
- Longpressing prompt buttons: Longpressing prompt button without active recording selects this prompt for immediate use and starts a recording
- Double click on Prompt Button: Double-click prompt buttons opens the edit-dialogue directly (quick access to edit prompts)
Fixed Instant Recording: Resolved issues with instant recording immediately ending in certain apps (e.g., Gemini) by adding a minimal initialization delay (to avoid RACE conditions)
GBoard-like functionality:
- Backspace-Key: Added swipe-capability to delete multiple words in one go
- Shift-Key: Pressing the button will toggle between lower, camel and upper case for the selected text or word at cursor position
Import/Export Prompts: Added feature to import and export user-created prompts/presets
Custom Characters with Emoji Support: Smiley support added to "input custom characters" with improved limit and styling
Better UI while Recording: Added a more modern and appealing style during Recording to notice instantly when recording is active
Function to Play last Recording: Added button in Preferences to play the previously recorded audio (mostly for debugging purposes in case of transcription issues)
Smart Transcription Flow: After finishing transcription, pressing send buttons (e.g., in WhatsApp) no longer triggers instant recording again
Improved Logging: Enhanced logging for more effective ADB debugging of prompts and API calls

Showcase

Since a picture is worth a thousand words, here is a showcase video and some screenshots:

License

Original Dictate was released under the terms of the Apache 2.0 license, following all clarifications stated in the license file

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
.idea		.idea
app		app
gradle		gradle
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improved Dictate Keyboard (Whisper AI transcription)

Key Improvements, Added Features and Bugfixes over original Dictate

Showcase

License

About

Uh oh!

Releases

Packages

Languages

License

bjspi/AI-Voice-Keyboard

Folders and files

Latest commit

History

Repository files navigation

Improved Dictate Keyboard (Whisper AI transcription)

Key Improvements, Added Features and Bugfixes over original Dictate

Showcase

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages