Skip to content

bjspi/AI-Voice-Keyboard

 
 

Repository files navigation

Improved Dictate Keyboard (Whisper AI transcription)

Dictate is an easy-to-use keyboard for transcribing and dictating. This is a rebranded and enhanced version of the original FOSS Dictate app, featuring numerous additions, improvements, and bug fixes. The app uses OpenAI Whisper in the background, which supports extremely accurate results for many different languages with punctuation and custom AI rewording using GPT-4 Omni.

This fork is based on the original Dictate repository. This fork includes a lot of major improvements, new features, and bug fixes. The following sections provide an overview of the key changes and additions made in this enhanced version.

The APK can be downloaded as Debug Build (unsigned) from here.

Key Improvements, Added Features and Bugfixes over original Dictate

  • Transcription via SendTo / Share: Added Keyboard into the "Share / SendTo" menu to transcribe audio files from any app (e.g., WhatsApp voice messages) using share menu
  • Better API Prompt control: The original APP doesn't make use of System Prompts AND adds a custom prompt to each request, which can lead to unexpected results. See the original code for the constant PROMPT_REWORDING_BE_PRECISE. This can get confusing for the API because it could be written in a different language than your prompt. This version allows to set your defined Prompts as System Prompt and the input text (i.e. your transcription) is sent as user message. This way, the API has a much better understanding of what you want to achieve. Removed the used of PROMPT_REWORDING_BE_PRECISE, because all instructions shall be part of your own definitions.
    • Ability to control API temperature: Added option to control the temperature of the API calls (0.0 - 1.0) to influence creativity of the output
  • Automation of Rewording-Prompts: Added option to automatically use one of the defined custom prompts after each transcription automatically.
  • Quickselection of Temporary Rewording Prompt: Added option to temporarily toggle one of the user-defined prompts for immediate use for the current Keyboard session (until Keyboard is closed)
  • Live/Instant Prompting based on Textselection: Added functionality to use the "Live Prompt" (Instant Prompt) based on the current text in the input field. If you click the LivePrompt Button as normal, it will start a recording and use the transcript as input for the prompt. If you make a text-selection and click the button, it will use the selected text in the input field as input for the prompt without starting a recording.
  • Added support for Bluetooth Headsets : Added support for Bluetooth headsets (e.g., AirPods) as idea (Thanks to @cuylerstuwe) with proper handling of SCO and asynchronous connection handling, indicated by a Bluetooth icon in the UI
  • Improved Workflow: Stop Recording & switch back: Added a button to stop recording and return to previous Keyboard (e.g., Gboard) without needing to switch manually
  • Enhanced Prompt-Buttons UI: Prompt buttons are always visible and intelligently handle text selection - using either existing selection or automatically selecting all text
    • Pressing prompt buttons during Active Recording: Pressing prompt buttons during an active recording just toggles this prompt to use after Recording Stop instead of immediately applying it to existing text
    • Longpressing prompt buttons: Longpressing prompt button without active recording selects this prompt for immediate use and starts a recording
    • Double click on Prompt Button: Double-click prompt buttons opens the edit-dialogue directly (quick access to edit prompts)
  • Fixed Instant Recording: Resolved issues with instant recording immediately ending in certain apps (e.g., Gemini) by adding a minimal initialization delay (to avoid RACE conditions)
  • GBoard-like functionality:
    • Backspace-Key: Added swipe-capability to delete multiple words in one go
    • Shift-Key: Pressing the button will toggle between lower, camel and upper case for the selected text or word at cursor position
  • Import/Export Prompts: Added feature to import and export user-created prompts/presets
  • Custom Characters with Emoji Support: Smiley support added to "input custom characters" with improved limit and styling
  • Better UI while Recording: Added a more modern and appealing style during Recording to notice instantly when recording is active
  • Function to Play last Recording: Added button in Preferences to play the previously recorded audio (mostly for debugging purposes in case of transcription issues)
  • Smart Transcription Flow: After finishing transcription, pressing send buttons (e.g., in WhatsApp) no longer triggers instant recording again
  • Improved Logging: Enhanced logging for more effective ADB debugging of prompts and API calls

Showcase

Since a picture is worth a thousand words, here is a showcase video and some screenshots:

dictate_keyboard_notes_recording.png dictate_settings.png
dictate_settings_2.png dictate_prompts_overview.png dictate_prompts_edit.png

License

Original Dictate was released under the terms of the Apache 2.0 license, following all clarifications stated in the license file

About

A powerful AI Voice keyboard for reliable speech transcription, build under app/build/outputs/apk/debug

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 97.4%
  • HTML 2.6%