This is a local Python application to evaluate spoken German sentences using OpenAI's Whisper model. Users upload audio (.webm, usually connected with a frontend sending request to this), and the app provides a pronunciation score and feedback compared to a set of expected sentences.
├── main.py # Backend logic (or whisper_utils.py for modular usage)
├── whisper_utils.py # (Optional) Modular Whisper processing
├── requirements.txt # Python dependencies
├── render.yaml # (Ignore if using locally)
├── temp_audio.webm # Temp uploaded audio file (auto-created/deleted)
└── __pycache__/ # Python cache
- Accepts an audio recording (WebM format).
- Transcribes the audio using OpenAI Whisper.
- Compares it against a predefined sentence (based on
phrase_id). - Scores the pronunciation using character-level similarity.
- Outputs:
- Expected vs Spoken sentence
- Accuracy Score
- Mispronounced letters
git clone https://github.com/krish-1010/whisper-backend
cd whisper-backendpython -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activatepip install -r requirements.txtDependencies include:
openai-whisperffmpeg-python(ensure system ffmpeg is installed)uvicorn,fastapi(for API usage)
Download ffmpeg:
- Windows: https://www.gyan.dev/ffmpeg/builds/
- Linux/macOS: via
breworapt
Ensure ffmpeg is in your system PATH.
uvicorn main:app --reloadcurl -X POST "http://localhost:8000/evaluate/1-1" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@your_audio_file.webm"You can find all supported sentence IDs and phrases inside main.py or whisper_utils.py under EXPECTED_PHRASES.
Example:
"1-1": "Ich bin müde"
"2-3": "Kannst du helfen?"{
"expected": "Ich bin müde",
"spoken": "und bin mut",
"score": 50.0,
"feedback": "🗣️ Mispronounced letters: i, c, h, ü, d, e"
}- WebM audio is expected (convert if needed).
- You may extend this with a frontend or voice recording interface.
- For offline usage, small or tiny Whisper models are ideal.
MIT