Skip to content

addygeek/STT-Modular-Wake-Sleep-Assistant-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ Real-Time Wake/Sleep Speech-to-Text (STT) Module

This project implements a real-time speech-to-text (STT) module using Vosk for Python and a React frontend. The module starts transcription when a wake word ("hi") is spoken, pauses when a sleep word ("bye") is detected, and can resume continuously after repeating the wake word. It is backend-frontend ready and can be integrated into mobile or web applications.


🎥 Watch the Demo on YouTube

Click to Watch Demo

➡️ Watch the Demo Here

Features

  • 🔊 Real-time speech transcription using Vosk.
  • 🟢 Wake word detection to start transcription (hi by default).
  • 🔴 Sleep word detection to pause transcription (bye by default).
  • 🖥️ Frontend display via WebSocket.
  • ♻️ Continuous listening even after sleep/pause cycles.
  • Lightweight and modular backend, easy to integrate into apps.
  • 💻 Works locally for demo and development.

Project Structure

vosk-stt-project/
│
├─ backend/
│   ├─ stt_vosk.py          # STT module handling Vosk recognition
│   └─ main_vosk.py         # FastAPI + WebSocket server
│
├─ frontend/
│   ├─ public/
│   ├─ src/
│   │   ├─ hooks/
│   │   │   └─ useWebSocket.ts   # WebSocket hook to connect backend
│   │   ├─ components/
│   │   │   └─ StatusIndicator.tsx
│   │   └─ pages/
│   │       └─ Index.tsx
│   ├─ package.json
│   └─ vite.config.ts
│
├─ vosk-model-en-in-0.5/    # Pretrained Vosk model
├─ README.md
└─ requirements.txt

Requirements

Backend (Python)

  • Python ≥ 3.10
  • Vosk (pip install vosk)
  • Sounddevice (pip install sounddevice)
  • FastAPI (pip install fastapi uvicorn websockets)
  • Optional: pyaudio if you encounter issues

Frontend (React/TypeScript)

  • Node.js ≥ 18
  • npm or bun
  • Vite (bundler)
  • TailwindCSS (UI)

Setup Instructions

1️⃣ Backend

  1. Clone this repo and navigate to the backend folder:
cd backend
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the backend server:
python -m uvicorn main_vosk:app --host 0.0.0.0 --port 8000

Notes:

  • Make sure the Vosk model path is correct in stt_vosk.py.
  • The backend will continuously listen to your microphone.
  • Wake word "hi" starts sending transcription to the frontend.
  • Sleep word "bye" pauses sending transcription but continues listening.

2️⃣ Frontend

  1. Navigate to the frontend folder:
cd frontend
  1. Install packages:
npm install
# or
bun install
  1. Run the development server:
npm run dev
# or
bun run dev
  1. Open your browser:
http://localhost:8080
  1. Click Connect to Backend to start receiving live transcription.

Usage Flow

  1. Start backend (uvicorn main_vosk:app --host 0.0.0.0 --port 8000).
  2. Start frontend (npm run dev).
  3. Click Connect to Backend on frontend UI.
  4. Speak “hi” → transcription starts.
  5. Speak normally → transcription appears in frontend & logs in backend.
  6. Speak “bye” → transcription pauses, but listening continues.
  7. Speak “hi” again → transcription resumes, appends new text.

Configuration

  • stt_vosk.py:

    wake_word = "hi"
    sleep_word = "bye"
    model_path = "vosk-model-en-in-0.5"
  • main_vosk.py:

    WEBSOCKET_PORT = 8000
  • useWebSocket.ts:

    const ws = new WebSocket("ws://localhost:8000/ws");

Troubleshooting

  • Microphone not working: Ensure your system microphone is accessible. On Windows, check privacy settings.
  • Vosk model errors: Verify vosk-model-en-in-0.5 exists and matches the path in stt_vosk.py.
  • Frontend WebSocket errors: Check that backend is running on ws://localhost:8000/ws.
  • Continuous listening: Module keeps listening even after sleep. Only sends transcription after wake word.

Future Improvements

  • Add multiple wake/sleep words.
  • Language switching for multilingual transcription.
  • Integrate with React Native for mobile apps.
  • Use Whisper or Deepgram for higher accuracy.

License

MIT License — free to use, modify, and distribute.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors