🗣️ Real-Time Wake/Sleep Speech-to-Text (STT) Module

This project implements a real-time speech-to-text (STT) module using Vosk for Python and a React frontend. The module starts transcription when a wake word ("hi") is spoken, pauses when a sleep word ("bye") is detected, and can resume continuously after repeating the wake word. It is backend-frontend ready and can be integrated into mobile or web applications.

🎥 Watch the Demo on YouTube

➡️ Watch the Demo Here

Features

🔊 Real-time speech transcription using Vosk.
🟢 Wake word detection to start transcription (hi by default).
🔴 Sleep word detection to pause transcription (bye by default).
🖥️ Frontend display via WebSocket.
♻️ Continuous listening even after sleep/pause cycles.
⚡ Lightweight and modular backend, easy to integrate into apps.
💻 Works locally for demo and development.

Project Structure

vosk-stt-project/
│
├─ backend/
│   ├─ stt_vosk.py          # STT module handling Vosk recognition
│   └─ main_vosk.py         # FastAPI + WebSocket server
│
├─ frontend/
│   ├─ public/
│   ├─ src/
│   │   ├─ hooks/
│   │   │   └─ useWebSocket.ts   # WebSocket hook to connect backend
│   │   ├─ components/
│   │   │   └─ StatusIndicator.tsx
│   │   └─ pages/
│   │       └─ Index.tsx
│   ├─ package.json
│   └─ vite.config.ts
│
├─ vosk-model-en-in-0.5/    # Pretrained Vosk model
├─ README.md
└─ requirements.txt

Requirements

Backend (Python)

Python ≥ 3.10
Vosk (pip install vosk)
Sounddevice (pip install sounddevice)
FastAPI (pip install fastapi uvicorn websockets)
Optional: pyaudio if you encounter issues

Frontend (React/TypeScript)

Node.js ≥ 18
npm or bun
Vite (bundler)
TailwindCSS (UI)

Setup Instructions

1️⃣ Backend

Clone this repo and navigate to the backend folder:

cd backend

Install dependencies:

pip install -r requirements.txt

Run the backend server:

python -m uvicorn main_vosk:app --host 0.0.0.0 --port 8000

Notes:

Make sure the Vosk model path is correct in stt_vosk.py.
The backend will continuously listen to your microphone.
Wake word "hi" starts sending transcription to the frontend.
Sleep word "bye" pauses sending transcription but continues listening.

2️⃣ Frontend

Navigate to the frontend folder:

cd frontend

Install packages:

npm install
# or
bun install

Run the development server:

npm run dev
# or
bun run dev

Open your browser:

http://localhost:8080

Click Connect to Backend to start receiving live transcription.

Usage Flow

Start backend (uvicorn main_vosk:app --host 0.0.0.0 --port 8000).
Start frontend (npm run dev).
Click Connect to Backend on frontend UI.
Speak “hi” → transcription starts.
Speak normally → transcription appears in frontend & logs in backend.
Speak “bye” → transcription pauses, but listening continues.
Speak “hi” again → transcription resumes, appends new text.

Configuration

stt_vosk.py:

wake_word = "hi"
sleep_word = "bye"
model_path = "vosk-model-en-in-0.5"

main_vosk.py:
```
WEBSOCKET_PORT = 8000
```

useWebSocket.ts:

const ws = new WebSocket("ws://localhost:8000/ws");

Troubleshooting

Microphone not working: Ensure your system microphone is accessible. On Windows, check privacy settings.
Vosk model errors: Verify vosk-model-en-in-0.5 exists and matches the path in stt_vosk.py.
Frontend WebSocket errors: Check that backend is running on ws://localhost:8000/ws.
Continuous listening: Module keeps listening even after sleep. Only sends transcription after wake word.

Future Improvements

Add multiple wake/sleep words.
Language switching for multilingual transcription.
Integrate with React Native for mobile apps.
Use Whisper or Deepgram for higher accuracy.

License

MIT License — free to use, modify, and distribute.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BACKEND		BACKEND
FRONTEND		FRONTEND
README.md		README.md
temp_auto_push.bat		temp_auto_push.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ Real-Time Wake/Sleep Speech-to-Text (STT) Module

🎥 Watch the Demo on YouTube

Features

Project Structure

Requirements

Backend (Python)

Frontend (React/TypeScript)

Setup Instructions

1️⃣ Backend

2️⃣ Frontend

Usage Flow

Configuration

Troubleshooting

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🗣️ Real-Time Wake/Sleep Speech-to-Text (STT) Module

🎥 Watch the Demo on YouTube

Features

Project Structure

Requirements

Backend (Python)

Frontend (React/TypeScript)

Setup Instructions

1️⃣ Backend

2️⃣ Frontend

Usage Flow

Configuration

Troubleshooting

Future Improvements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages