Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 34 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
- 🌐 **Cross-Platform**: Works on macOS, Windows, and Linux
- ⚡ **Automatic Pasting**: Transcribed text automatically pastes at your cursor location
- 🖱️ **Draggable Interface**: Move the dictation panel anywhere on your screen
- 👁️ **Panel Visibility Modes**: Choose "Always Visible", "When Transcribing", or "Always Hidden"
- 🔄 **OpenAI Responses API**: Using the latest Responses API for improved performance
- 🌐 **Globe Key Toggle (macOS)**: Optional Fn/Globe key listener for a hardware-level dictation trigger
- ⌨️ **Compound Hotkeys**: Support for multi-key combinations like `Cmd+Shift+K`
Expand All @@ -46,32 +47,36 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
### For Personal Use (Recommended)

1. **Clone the repository**:

```bash
git clone https://github.com/HeroTools/open-whispr.git
cd open-whispr
```

2. **Install dependencies**:

```bash
npm install
```

3. **Optional: Set up API keys** (only needed for cloud processing):

**Method A - Environment file**:

```bash
cp env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# ANTHROPIC_API_KEY=your_anthropic_key
# ANTHROPIC_API_KEY=your_anthropic_key
# GEMINI_API_KEY=your_gemini_key
```

**Method B - In-app configuration**:
- Run the app and configure API keys through the Control Panel
- Keys are automatically saved and persist across app restarts

4. **Run the application**:

```bash
npm run dev # Development mode with hot reload
# OR
Expand Down Expand Up @@ -104,7 +109,8 @@ npm run pack
OpenWhispr now supports multiple Linux package formats for maximum compatibility:

**Available Formats**:
- `.deb` - Debian, Ubuntu, Linux Mint, Pop!_OS

- `.deb` - Debian, Ubuntu, Linux Mint, Pop!\_OS
- `.rpm` - Fedora, Red Hat, CentOS, openSUSE
- `.tar.gz` - Universal archive (works on any distro)
- `.flatpak` - Sandboxed cross-distro package
Expand Down Expand Up @@ -166,6 +172,7 @@ chmod +x dist/OpenWhispr-*.AppImage
The clipboard paste feature requires platform-specific tools:

**X11 (Traditional Linux Desktop)**:

```bash
# Debian/Ubuntu
sudo apt install xdotool
Expand All @@ -178,6 +185,7 @@ sudo pacman -S xdotool
```

**Wayland (Modern Linux Desktop)**:

```bash
# Debian/Ubuntu
sudo apt install wtype
Expand Down Expand Up @@ -230,13 +238,15 @@ npm run build:linux # Linux
## Usage

### Basic Dictation

1. **Start the app** - A small draggable panel appears on your screen
2. **Press your hotkey** (default: backtick `) - Start dictating (panel shows recording animation)
3. **Press your hotkey again** - Stop dictation and begin transcription (panel shows processing animation)
4. **Text appears** - Transcribed text is automatically pasted at your cursor location
5. **Drag the panel** - Click and drag to move the dictation panel anywhere on your screen

### Control Panel

- **Access**: Right-click the tray icon (macOS) or through the system menu
- **Configure**: Choose between local and cloud processing
- **History**: View, copy, and delete past transcriptions
Expand All @@ -245,36 +255,42 @@ npm run build:linux # Linux
- **Settings**: Configure API keys, customize hotkeys, and manage permissions

### Uninstall & Cache Cleanup
- **In-App**: Use *Settings → General → Local Model Storage → Remove Downloaded Models* to clear `~/.cache/openwhispr/whisper-models` (or `%USERPROFILE%\.cache\openwhispr\whisper-models` on Windows).

- **In-App**: Use _Settings → General → Local Model Storage → Remove Downloaded Models_ to clear `~/.cache/openwhispr/whisper-models` (or `%USERPROFILE%\.cache\openwhispr\whisper-models` on Windows).
- **Windows Uninstall**: The NSIS uninstaller automatically deletes the same cache directory.
- **Linux Packages**: `deb`/`rpm` post-uninstall scripts also remove cached models.
- **macOS**: If you uninstall manually, remove `~/Library/Caches` or `~/.cache/openwhispr/whisper-models` if desired.

### Agent Naming & AI Processing

Once you've named your agent during setup, you can interact with it using multiple AI providers:

**🎯 Agent Commands** (for AI assistance):

- "Hey [AgentName], make this more professional"
- "Hey [AgentName], format this as a list"
- "Hey [AgentName], write a thank you email"
- "Hey [AgentName], convert this to bullet points"

**🤖 AI Provider Options**:

- **OpenAI**: GPT-5, GPT-4.1, o-series reasoning models
- **Anthropic**: Claude Opus 4.5, Sonnet 4.5, Haiku 4.5
- **Google**: Gemini 2.5 Pro/Flash/Flash-Lite
- **Groq**: Ultra-fast Llama and Mixtral inference
- **Local**: Qwen, LLaMA, Mistral via llama.cpp

**📝 Regular Dictation** (for normal text):

- "This is just normal text I want transcribed"
- "Meeting notes: John mentioned the quarterly report"
- "Dear Sarah, thank you for your help"

The AI automatically detects when you're giving it commands versus dictating regular text, and removes agent name references from the final output.

### Processing Options
- **Local Processing**:

- **Local Processing**:
- Install Whisper automatically through the Control Panel
- Download models: tiny (fastest), base (recommended), small, medium, large (best quality)
- Complete privacy - audio never leaves your device
Expand Down Expand Up @@ -353,6 +369,7 @@ open-whispr/
### Architecture

The app consists of two main windows:

1. **Main Window**: Minimal overlay for dictation controls
2. **Control Panel**: Full settings and history interface

Expand All @@ -370,6 +387,7 @@ Both use the same React codebase but render different components based on URL pa
### Tailwind CSS v4 Setup

This project uses the latest Tailwind CSS v4 with:

- CSS-first configuration using `@theme` directive
- Vite plugin for optimal performance
- Custom design tokens for consistent theming
Expand Down Expand Up @@ -411,7 +429,7 @@ LANGUAGE=
# Optional: Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Optional: Google Gemini API Configuration
# Optional: Google Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Debug mode
Expand All @@ -427,20 +445,27 @@ For local processing, OpenWhispr uses OpenAI's Whisper model via whisper.cpp - a
3. **No Dependencies**: No Python or other runtime required

**System Fallback**: If the bundled binary fails, install via package manager:

- macOS: `brew install whisper-cpp`
- Linux: Build from source at https://github.com/ggml-org/whisper.cpp

**From Source**: When running locally (not a packaged build), download the binary with `npm run download:whisper-cpp` so `resources/bin/` has your platform executable.

**Requirements**:

- Sufficient disk space for models (75MB - 3GB depending on model)

**Upgrading from Python-based version**: If you previously used the Python-based Whisper, you'll need to re-download models in GGML format. You can safely delete the old Python environment (`~/.openwhispr/python/`) and PyTorch models (`~/.cache/whisper/`) to reclaim disk space.

### Customization

- **Hotkey**: Change in the Control Panel (default: backtick `) - fully customizable
- **Panel Position**: Drag the dictation panel to any location on your screen`
- **Panel Position**: Drag the dictation panel to any location on your screen
- **Panel Visibility**: Choose how the dictation panel behaves:
- _Always Visible_: Panel stays on screen (default)
- _When Transcribing_: Panel shows during recording, hides after
- _Always Hidden_: Panel never shows, dictation works in background
- Access via Settings → General, quick menu (right-click panel), or system tray
- **Processing Method**: Choose local or cloud in Control Panel
- **Whisper Model**: Select quality vs speed in Control Panel
- **UI Theme**: Edit CSS variables in `src/index.css`
Expand All @@ -463,6 +488,7 @@ We welcome contributions! Please follow these steps:
- Follow the existing code style
- Update documentation as needed
- Test on your target platform before submitting

## Security

OpenWhispr is designed with privacy and security in mind:
Expand Down
69 changes: 27 additions & 42 deletions preload.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,27 +26,21 @@ contextBridge.exposeInMainWorld("electronAPI", {
pasteText: (text) => ipcRenderer.invoke("paste-text", text),
hideWindow: () => ipcRenderer.invoke("hide-window"),
showDictationPanel: () => ipcRenderer.invoke("show-dictation-panel"),
onToggleDictation: registerListener(
"toggle-dictation",
(callback) => () => callback()
),
onStartDictation: registerListener(
"start-dictation",
(callback) => () => callback()
),
onStopDictation: registerListener(
"stop-dictation",
(callback) => () => callback()
),
syncPanelVisibilityMode: (mode) => ipcRenderer.invoke("sync-panel-visibility-mode", mode),
onPanelVisibilityModeChanged: (callback) => {
const listener = (_event, mode) => callback?.(mode);
ipcRenderer.on("panel-visibility-mode-changed", listener);
return () => ipcRenderer.removeListener("panel-visibility-mode-changed", listener);
},
onToggleDictation: registerListener("toggle-dictation", (callback) => () => callback()),
onStartDictation: registerListener("start-dictation", (callback) => () => callback()),
onStopDictation: registerListener("stop-dictation", (callback) => () => callback()),

// Database functions
saveTranscription: (text) =>
ipcRenderer.invoke("db-save-transcription", text),
getTranscriptions: (limit) =>
ipcRenderer.invoke("db-get-transcriptions", limit),
saveTranscription: (text) => ipcRenderer.invoke("db-save-transcription", text),
getTranscriptions: (limit) => ipcRenderer.invoke("db-get-transcriptions", limit),
clearTranscriptions: () => ipcRenderer.invoke("db-clear-transcriptions"),
deleteTranscription: (id) =>
ipcRenderer.invoke("db-delete-transcription", id),
deleteTranscription: (id) => ipcRenderer.invoke("db-delete-transcription", id),
onTranscriptionAdded: (callback) => {
const listener = (_event, transcription) => callback?.(transcription);
ipcRenderer.on("transcription-added", listener);
Expand All @@ -60,15 +54,13 @@ contextBridge.exposeInMainWorld("electronAPI", {
onTranscriptionsCleared: (callback) => {
const listener = (_event, data) => callback?.(data);
ipcRenderer.on("transcriptions-cleared", listener);
return () =>
ipcRenderer.removeListener("transcriptions-cleared", listener);
return () => ipcRenderer.removeListener("transcriptions-cleared", listener);
},

// Environment variables
getOpenAIKey: () => ipcRenderer.invoke("get-openai-key"),
saveOpenAIKey: (key) => ipcRenderer.invoke("save-openai-key", key),
createProductionEnvFile: (key) =>
ipcRenderer.invoke("create-production-env-file", key),
createProductionEnvFile: (key) => ipcRenderer.invoke("create-production-env-file", key),

// Settings management
saveSettings: (settings) => ipcRenderer.invoke("save-settings", settings),
Expand All @@ -81,25 +73,19 @@ contextBridge.exposeInMainWorld("electronAPI", {
// Local Whisper functions (whisper.cpp)
transcribeLocalWhisper: (audioBlob, options) =>
ipcRenderer.invoke("transcribe-local-whisper", audioBlob, options),
checkWhisperInstallation: () =>
ipcRenderer.invoke("check-whisper-installation"),
downloadWhisperModel: (modelName) =>
ipcRenderer.invoke("download-whisper-model", modelName),
checkWhisperInstallation: () => ipcRenderer.invoke("check-whisper-installation"),
downloadWhisperModel: (modelName) => ipcRenderer.invoke("download-whisper-model", modelName),
onWhisperDownloadProgress: registerListener("whisper-download-progress"),
checkModelStatus: (modelName) =>
ipcRenderer.invoke("check-model-status", modelName),
checkModelStatus: (modelName) => ipcRenderer.invoke("check-model-status", modelName),
listWhisperModels: () => ipcRenderer.invoke("list-whisper-models"),
deleteWhisperModel: (modelName) =>
ipcRenderer.invoke("delete-whisper-model", modelName),
deleteWhisperModel: (modelName) => ipcRenderer.invoke("delete-whisper-model", modelName),
deleteAllWhisperModels: () => ipcRenderer.invoke("delete-all-whisper-models"),
cancelWhisperDownload: () => ipcRenderer.invoke("cancel-whisper-download"),
checkFFmpegAvailability: () =>
ipcRenderer.invoke("check-ffmpeg-availability"),
checkFFmpegAvailability: () => ipcRenderer.invoke("check-ffmpeg-availability"),
getAudioDiagnostics: () => ipcRenderer.invoke("get-audio-diagnostics"),

// Whisper server functions (faster repeated transcriptions)
whisperServerStart: (modelName) =>
ipcRenderer.invoke("whisper-server-start", modelName),
whisperServerStart: (modelName) => ipcRenderer.invoke("whisper-server-start", modelName),
whisperServerStop: () => ipcRenderer.invoke("whisper-server-stop"),
whisperServerStatus: () => ipcRenderer.invoke("whisper-server-status"),

Expand Down Expand Up @@ -140,7 +126,7 @@ contextBridge.exposeInMainWorld("electronAPI", {

// External link opener
openExternal: (url) => ipcRenderer.invoke("open-external", url),

// Model management functions
modelGetAll: () => ipcRenderer.invoke("model-get-all"),
modelCheck: (modelId) => ipcRenderer.invoke("model-check", modelId),
Expand All @@ -150,7 +136,7 @@ contextBridge.exposeInMainWorld("electronAPI", {
modelCheckRuntime: () => ipcRenderer.invoke("model-check-runtime"),
modelCancelDownload: (modelId) => ipcRenderer.invoke("model-cancel-download", modelId),
onModelDownloadProgress: registerListener("model-download-progress"),

// Anthropic API
getAnthropicKey: () => ipcRenderer.invoke("get-anthropic-key"),
saveAnthropicKey: (key) => ipcRenderer.invoke("save-anthropic-key", key),
Expand All @@ -164,20 +150,19 @@ contextBridge.exposeInMainWorld("electronAPI", {
saveGroqKey: (key) => ipcRenderer.invoke("save-groq-key", key),

// Local reasoning
processLocalReasoning: (text, modelId, agentName, config) =>
processLocalReasoning: (text, modelId, agentName, config) =>
ipcRenderer.invoke("process-local-reasoning", text, modelId, agentName, config),
checkLocalReasoningAvailable: () =>
ipcRenderer.invoke("check-local-reasoning-available"),

checkLocalReasoningAvailable: () => ipcRenderer.invoke("check-local-reasoning-available"),

// Anthropic reasoning
processAnthropicReasoning: (text, modelId, agentName, config) =>
ipcRenderer.invoke("process-anthropic-reasoning", text, modelId, agentName, config),

// llama.cpp
llamaCppCheck: () => ipcRenderer.invoke("llama-cpp-check"),
llamaCppInstall: () => ipcRenderer.invoke("llama-cpp-install"),
llamaCppUninstall: () => ipcRenderer.invoke("llama-cpp-uninstall"),

getLogLevel: () => ipcRenderer.invoke("get-log-level"),
log: (entry) => ipcRenderer.invoke("app-log", entry),

Expand Down
Loading