diff --git a/README.md b/README.md index 8d2c5ada..7e982f49 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file - 🌐 **Cross-Platform**: Works on macOS, Windows, and Linux - ⚡ **Automatic Pasting**: Transcribed text automatically pastes at your cursor location - 🖱️ **Draggable Interface**: Move the dictation panel anywhere on your screen +- 👁️ **Panel Visibility Modes**: Choose "Always Visible", "When Transcribing", or "Always Hidden" - 🔄 **OpenAI Responses API**: Using the latest Responses API for improved performance - 🌐 **Globe Key Toggle (macOS)**: Optional Fn/Globe key listener for a hardware-level dictation trigger - ⌨️ **Compound Hotkeys**: Support for multi-key combinations like `Cmd+Shift+K` @@ -46,32 +47,36 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file ### For Personal Use (Recommended) 1. **Clone the repository**: + ```bash git clone https://github.com/HeroTools/open-whispr.git cd open-whispr ``` 2. **Install dependencies**: + ```bash npm install ``` 3. **Optional: Set up API keys** (only needed for cloud processing): - + **Method A - Environment file**: + ```bash cp env.example .env # Edit .env and add your API keys: # OPENAI_API_KEY=your_openai_key - # ANTHROPIC_API_KEY=your_anthropic_key + # ANTHROPIC_API_KEY=your_anthropic_key # GEMINI_API_KEY=your_gemini_key ``` - + **Method B - In-app configuration**: - Run the app and configure API keys through the Control Panel - Keys are automatically saved and persist across app restarts 4. **Run the application**: + ```bash npm run dev # Development mode with hot reload # OR @@ -104,7 +109,8 @@ npm run pack OpenWhispr now supports multiple Linux package formats for maximum compatibility: **Available Formats**: -- `.deb` - Debian, Ubuntu, Linux Mint, Pop!_OS + +- `.deb` - Debian, Ubuntu, Linux Mint, Pop!\_OS - `.rpm` - Fedora, Red Hat, CentOS, openSUSE - `.tar.gz` - Universal archive (works on any distro) - `.flatpak` - Sandboxed cross-distro package @@ -166,6 +172,7 @@ chmod +x dist/OpenWhispr-*.AppImage The clipboard paste feature requires platform-specific tools: **X11 (Traditional Linux Desktop)**: + ```bash # Debian/Ubuntu sudo apt install xdotool @@ -178,6 +185,7 @@ sudo pacman -S xdotool ``` **Wayland (Modern Linux Desktop)**: + ```bash # Debian/Ubuntu sudo apt install wtype @@ -230,6 +238,7 @@ npm run build:linux # Linux ## Usage ### Basic Dictation + 1. **Start the app** - A small draggable panel appears on your screen 2. **Press your hotkey** (default: backtick `) - Start dictating (panel shows recording animation) 3. **Press your hotkey again** - Stop dictation and begin transcription (panel shows processing animation) @@ -237,6 +246,7 @@ npm run build:linux # Linux 5. **Drag the panel** - Click and drag to move the dictation panel anywhere on your screen ### Control Panel + - **Access**: Right-click the tray icon (macOS) or through the system menu - **Configure**: Choose between local and cloud processing - **History**: View, copy, and delete past transcriptions @@ -245,21 +255,25 @@ npm run build:linux # Linux - **Settings**: Configure API keys, customize hotkeys, and manage permissions ### Uninstall & Cache Cleanup -- **In-App**: Use *Settings → General → Local Model Storage → Remove Downloaded Models* to clear `~/.cache/openwhispr/whisper-models` (or `%USERPROFILE%\.cache\openwhispr\whisper-models` on Windows). + +- **In-App**: Use _Settings → General → Local Model Storage → Remove Downloaded Models_ to clear `~/.cache/openwhispr/whisper-models` (or `%USERPROFILE%\.cache\openwhispr\whisper-models` on Windows). - **Windows Uninstall**: The NSIS uninstaller automatically deletes the same cache directory. - **Linux Packages**: `deb`/`rpm` post-uninstall scripts also remove cached models. - **macOS**: If you uninstall manually, remove `~/Library/Caches` or `~/.cache/openwhispr/whisper-models` if desired. ### Agent Naming & AI Processing + Once you've named your agent during setup, you can interact with it using multiple AI providers: **🎯 Agent Commands** (for AI assistance): + - "Hey [AgentName], make this more professional" - "Hey [AgentName], format this as a list" - "Hey [AgentName], write a thank you email" - "Hey [AgentName], convert this to bullet points" **🤖 AI Provider Options**: + - **OpenAI**: GPT-5, GPT-4.1, o-series reasoning models - **Anthropic**: Claude Opus 4.5, Sonnet 4.5, Haiku 4.5 - **Google**: Gemini 2.5 Pro/Flash/Flash-Lite @@ -267,6 +281,7 @@ Once you've named your agent during setup, you can interact with it using multip - **Local**: Qwen, LLaMA, Mistral via llama.cpp **📝 Regular Dictation** (for normal text): + - "This is just normal text I want transcribed" - "Meeting notes: John mentioned the quarterly report" - "Dear Sarah, thank you for your help" @@ -274,7 +289,8 @@ Once you've named your agent during setup, you can interact with it using multip The AI automatically detects when you're giving it commands versus dictating regular text, and removes agent name references from the final output. ### Processing Options -- **Local Processing**: + +- **Local Processing**: - Install Whisper automatically through the Control Panel - Download models: tiny (fastest), base (recommended), small, medium, large (best quality) - Complete privacy - audio never leaves your device @@ -353,6 +369,7 @@ open-whispr/ ### Architecture The app consists of two main windows: + 1. **Main Window**: Minimal overlay for dictation controls 2. **Control Panel**: Full settings and history interface @@ -370,6 +387,7 @@ Both use the same React codebase but render different components based on URL pa ### Tailwind CSS v4 Setup This project uses the latest Tailwind CSS v4 with: + - CSS-first configuration using `@theme` directive - Vite plugin for optimal performance - Custom design tokens for consistent theming @@ -411,7 +429,7 @@ LANGUAGE= # Optional: Anthropic API Configuration ANTHROPIC_API_KEY=your_anthropic_api_key_here -# Optional: Google Gemini API Configuration +# Optional: Google Gemini API Configuration GEMINI_API_KEY=your_gemini_api_key_here # Optional: Debug mode @@ -427,12 +445,14 @@ For local processing, OpenWhispr uses OpenAI's Whisper model via whisper.cpp - a 3. **No Dependencies**: No Python or other runtime required **System Fallback**: If the bundled binary fails, install via package manager: + - macOS: `brew install whisper-cpp` - Linux: Build from source at https://github.com/ggml-org/whisper.cpp **From Source**: When running locally (not a packaged build), download the binary with `npm run download:whisper-cpp` so `resources/bin/` has your platform executable. **Requirements**: + - Sufficient disk space for models (75MB - 3GB depending on model) **Upgrading from Python-based version**: If you previously used the Python-based Whisper, you'll need to re-download models in GGML format. You can safely delete the old Python environment (`~/.openwhispr/python/`) and PyTorch models (`~/.cache/whisper/`) to reclaim disk space. @@ -440,7 +460,12 @@ For local processing, OpenWhispr uses OpenAI's Whisper model via whisper.cpp - a ### Customization - **Hotkey**: Change in the Control Panel (default: backtick `) - fully customizable -- **Panel Position**: Drag the dictation panel to any location on your screen` +- **Panel Position**: Drag the dictation panel to any location on your screen +- **Panel Visibility**: Choose how the dictation panel behaves: + - _Always Visible_: Panel stays on screen (default) + - _When Transcribing_: Panel shows during recording, hides after + - _Always Hidden_: Panel never shows, dictation works in background + - Access via Settings → General, quick menu (right-click panel), or system tray - **Processing Method**: Choose local or cloud in Control Panel - **Whisper Model**: Select quality vs speed in Control Panel - **UI Theme**: Edit CSS variables in `src/index.css` @@ -463,6 +488,7 @@ We welcome contributions! Please follow these steps: - Follow the existing code style - Update documentation as needed - Test on your target platform before submitting + ## Security OpenWhispr is designed with privacy and security in mind: diff --git a/preload.js b/preload.js index 02d02e6d..99292515 100644 --- a/preload.js +++ b/preload.js @@ -26,27 +26,21 @@ contextBridge.exposeInMainWorld("electronAPI", { pasteText: (text) => ipcRenderer.invoke("paste-text", text), hideWindow: () => ipcRenderer.invoke("hide-window"), showDictationPanel: () => ipcRenderer.invoke("show-dictation-panel"), - onToggleDictation: registerListener( - "toggle-dictation", - (callback) => () => callback() - ), - onStartDictation: registerListener( - "start-dictation", - (callback) => () => callback() - ), - onStopDictation: registerListener( - "stop-dictation", - (callback) => () => callback() - ), + syncPanelVisibilityMode: (mode) => ipcRenderer.invoke("sync-panel-visibility-mode", mode), + onPanelVisibilityModeChanged: (callback) => { + const listener = (_event, mode) => callback?.(mode); + ipcRenderer.on("panel-visibility-mode-changed", listener); + return () => ipcRenderer.removeListener("panel-visibility-mode-changed", listener); + }, + onToggleDictation: registerListener("toggle-dictation", (callback) => () => callback()), + onStartDictation: registerListener("start-dictation", (callback) => () => callback()), + onStopDictation: registerListener("stop-dictation", (callback) => () => callback()), // Database functions - saveTranscription: (text) => - ipcRenderer.invoke("db-save-transcription", text), - getTranscriptions: (limit) => - ipcRenderer.invoke("db-get-transcriptions", limit), + saveTranscription: (text) => ipcRenderer.invoke("db-save-transcription", text), + getTranscriptions: (limit) => ipcRenderer.invoke("db-get-transcriptions", limit), clearTranscriptions: () => ipcRenderer.invoke("db-clear-transcriptions"), - deleteTranscription: (id) => - ipcRenderer.invoke("db-delete-transcription", id), + deleteTranscription: (id) => ipcRenderer.invoke("db-delete-transcription", id), onTranscriptionAdded: (callback) => { const listener = (_event, transcription) => callback?.(transcription); ipcRenderer.on("transcription-added", listener); @@ -60,15 +54,13 @@ contextBridge.exposeInMainWorld("electronAPI", { onTranscriptionsCleared: (callback) => { const listener = (_event, data) => callback?.(data); ipcRenderer.on("transcriptions-cleared", listener); - return () => - ipcRenderer.removeListener("transcriptions-cleared", listener); + return () => ipcRenderer.removeListener("transcriptions-cleared", listener); }, // Environment variables getOpenAIKey: () => ipcRenderer.invoke("get-openai-key"), saveOpenAIKey: (key) => ipcRenderer.invoke("save-openai-key", key), - createProductionEnvFile: (key) => - ipcRenderer.invoke("create-production-env-file", key), + createProductionEnvFile: (key) => ipcRenderer.invoke("create-production-env-file", key), // Settings management saveSettings: (settings) => ipcRenderer.invoke("save-settings", settings), @@ -81,25 +73,19 @@ contextBridge.exposeInMainWorld("electronAPI", { // Local Whisper functions (whisper.cpp) transcribeLocalWhisper: (audioBlob, options) => ipcRenderer.invoke("transcribe-local-whisper", audioBlob, options), - checkWhisperInstallation: () => - ipcRenderer.invoke("check-whisper-installation"), - downloadWhisperModel: (modelName) => - ipcRenderer.invoke("download-whisper-model", modelName), + checkWhisperInstallation: () => ipcRenderer.invoke("check-whisper-installation"), + downloadWhisperModel: (modelName) => ipcRenderer.invoke("download-whisper-model", modelName), onWhisperDownloadProgress: registerListener("whisper-download-progress"), - checkModelStatus: (modelName) => - ipcRenderer.invoke("check-model-status", modelName), + checkModelStatus: (modelName) => ipcRenderer.invoke("check-model-status", modelName), listWhisperModels: () => ipcRenderer.invoke("list-whisper-models"), - deleteWhisperModel: (modelName) => - ipcRenderer.invoke("delete-whisper-model", modelName), + deleteWhisperModel: (modelName) => ipcRenderer.invoke("delete-whisper-model", modelName), deleteAllWhisperModels: () => ipcRenderer.invoke("delete-all-whisper-models"), cancelWhisperDownload: () => ipcRenderer.invoke("cancel-whisper-download"), - checkFFmpegAvailability: () => - ipcRenderer.invoke("check-ffmpeg-availability"), + checkFFmpegAvailability: () => ipcRenderer.invoke("check-ffmpeg-availability"), getAudioDiagnostics: () => ipcRenderer.invoke("get-audio-diagnostics"), // Whisper server functions (faster repeated transcriptions) - whisperServerStart: (modelName) => - ipcRenderer.invoke("whisper-server-start", modelName), + whisperServerStart: (modelName) => ipcRenderer.invoke("whisper-server-start", modelName), whisperServerStop: () => ipcRenderer.invoke("whisper-server-stop"), whisperServerStatus: () => ipcRenderer.invoke("whisper-server-status"), @@ -140,7 +126,7 @@ contextBridge.exposeInMainWorld("electronAPI", { // External link opener openExternal: (url) => ipcRenderer.invoke("open-external", url), - + // Model management functions modelGetAll: () => ipcRenderer.invoke("model-get-all"), modelCheck: (modelId) => ipcRenderer.invoke("model-check", modelId), @@ -150,7 +136,7 @@ contextBridge.exposeInMainWorld("electronAPI", { modelCheckRuntime: () => ipcRenderer.invoke("model-check-runtime"), modelCancelDownload: (modelId) => ipcRenderer.invoke("model-cancel-download", modelId), onModelDownloadProgress: registerListener("model-download-progress"), - + // Anthropic API getAnthropicKey: () => ipcRenderer.invoke("get-anthropic-key"), saveAnthropicKey: (key) => ipcRenderer.invoke("save-anthropic-key", key), @@ -164,20 +150,19 @@ contextBridge.exposeInMainWorld("electronAPI", { saveGroqKey: (key) => ipcRenderer.invoke("save-groq-key", key), // Local reasoning - processLocalReasoning: (text, modelId, agentName, config) => + processLocalReasoning: (text, modelId, agentName, config) => ipcRenderer.invoke("process-local-reasoning", text, modelId, agentName, config), - checkLocalReasoningAvailable: () => - ipcRenderer.invoke("check-local-reasoning-available"), - + checkLocalReasoningAvailable: () => ipcRenderer.invoke("check-local-reasoning-available"), + // Anthropic reasoning processAnthropicReasoning: (text, modelId, agentName, config) => ipcRenderer.invoke("process-anthropic-reasoning", text, modelId, agentName, config), - + // llama.cpp llamaCppCheck: () => ipcRenderer.invoke("llama-cpp-check"), llamaCppInstall: () => ipcRenderer.invoke("llama-cpp-install"), llamaCppUninstall: () => ipcRenderer.invoke("llama-cpp-uninstall"), - + getLogLevel: () => ipcRenderer.invoke("get-log-level"), log: (entry) => ipcRenderer.invoke("app-log", entry), diff --git a/src/App.jsx b/src/App.jsx index ee7afcb8..eb4890a2 100644 --- a/src/App.jsx +++ b/src/App.jsx @@ -77,6 +77,9 @@ export default function App() { const { isDragging, handleMouseDown, handleMouseUp } = useWindowDrag(); const [dragStartPos, setDragStartPos] = useState(null); const [hasDragged, setHasDragged] = useState(false); + + // Track previous recording/processing state for auto-hide logic + const wasActiveRef = useRef(false); const setWindowInteractivity = React.useCallback((shouldCapture) => { window.electronAPI?.setMainWindowInteractivity?.(shouldCapture); @@ -127,6 +130,69 @@ export default function App() { onToggle: handleDictationToggle, }); + // Track panel visibility mode as state (synced via IPC) + const [panelVisibilityMode, setPanelVisibilityMode] = useState(() => { + return localStorage.getItem("panelVisibilityMode") || "always"; + }); + + // Listen for visibility mode changes from other windows via IPC + useEffect(() => { + const unsubscribe = window.electronAPI?.onPanelVisibilityModeChanged?.((mode) => { + // Update both React state AND localStorage + localStorage.setItem("panelVisibilityMode", mode); + setPanelVisibilityMode(mode); + }); + + // Also listen for storage events as backup for Control Panel changes + const handleStorageChange = (e) => { + if (e.key === "panelVisibilityMode") { + setPanelVisibilityMode(e.newValue || "always"); + } + }; + window.addEventListener("storage", handleStorageChange); + + return () => { + unsubscribe?.(); + window.removeEventListener("storage", handleStorageChange); + }; + }, []); + + // Single consolidated effect for all visibility logic + useEffect(() => { + const isActive = isRecording || isProcessing; + + // ALWAYS HIDDEN: Never show the panel, always hide + if (panelVisibilityMode === "hidden") { + window.electronAPI?.hideWindow?.(); + wasActiveRef.current = false; + return; + } + + // ALWAYS VISIBLE: Always show the panel + if (panelVisibilityMode === "always") { + window.electronAPI?.showDictationPanel?.(); + wasActiveRef.current = false; + return; + } + + // TRANSCRIBING MODE: Show only when recording/processing + if (panelVisibilityMode === "transcribing") { + if (isActive && !wasActiveRef.current) { + // Activity started - show panel + window.electronAPI?.showDictationPanel?.(); + wasActiveRef.current = true; + } else if (!isActive && wasActiveRef.current) { + // Activity ended - hide panel with small delay + const hideTimeout = setTimeout(() => { + window.electronAPI?.hideWindow?.(); + wasActiveRef.current = false; + }, 300); + return () => clearTimeout(hideTimeout); + } + // Note: If not active and wasActiveRef is false, do nothing - panel is already hidden + } + }, [panelVisibilityMode, isRecording, isProcessing]); + const handleClose = () => { window.electronAPI.hideWindow(); }; @@ -351,15 +417,44 @@ export default function App() { {isRecording ? "Stop listening" : "Start listening"}
++ Control when the dictation panel is visible on your screen. +
++ {panelVisibilityMode === "always" && + "The dictation panel is always shown on your screen."} + {panelVisibilityMode === "transcribing" && + "Panel appears when recording starts and hides when transcription completes."} + {panelVisibilityMode === "hidden" && + "Panel is never shown. Use the hotkey to record in the background."} +
+