forked from aasish/pocketsphinxandroid
-
Notifications
You must be signed in to change notification settings - Fork 10
Home
cesine edited this page May 30, 2011
·
12 revisions
PocketSphinx is service which performs continuous speech recognition CSR (also referred to as automatic Speech Recognition ASR, Voice to Text).
It was ported to Android by the cmusphinx team at SourceForge.
This project aims to use the PocketSpinx to meet the following requirements:
- Offline speech recognition
- Process pre-recorded audio file
- Preferred format: mp3
- Eyes Free ASR
- Transcribe Dictation/create transcript for podcasts/create subtitles for video
- Audio File to Text (Map of time period to Array of Hypotheized text)
- Boolean to run it on the device, or to send it to a Sphinx server elsewhere
- General Eyes-Free Speech Recognition
- Register PocketSphinx as a service which responds to android.speech.RecognizerIntent so that users can make it the default in the preferences (ie. if they have no data connection on their Android, or they are generally not online)
- Create an Open Intent for other developers to call PocketSphinx
- Function very similarly to com.google.android.voicesearch , except allow a boolean to control whether it stops "listening" on silence, or on user action {back button, screen tap, gesture, top to bottom swipe etc}
Audio file processing should allow for a boolean splitOnSilence
- Pre-process by creating an annotation file for the audio
- Format: .srt or WebVTT
- Reasoning: if the time annotation is provided in a .srt format it will allow for re-use of the code for other developer's purpose including
- Displaying subtitles, and re-syncing subtitles in a video player if they are out of sync
- Displaying transcripts of podcasts while playing in a music player
- Dependancies:
- Ecoding from MP3 to another format
- Detecting silence
- Sphinx considerations
- This sort of preprocessing step is probably already implemented in PocketSphinx, its just a question of finding it.
- This step is also implemented somewhere in the LIUM tools
- The MARF project has some libraries for audio analysis. Not sure how complete and which goals have been realized yet. MARF is an open-source research platform and a collection of voice/sound/speech/text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework facilitating addition of new algorithms.
- If the silence detection is not easy to find consider implementing a lightweight solution using the Java Sound API
- developer can check if wifi is active, and only transfer data when on wifi
- developer can choose to schedule the service overnight for example...
- developer can provide the logic to record via handset OR paired headsets microphone OR bluetooth
- Audio is saved to device so that the user doesn't "loose" their thoughts and they can re-listen to their audio to correct the transcription.
- http://stackoverflow.com/questions/5690850/extract-and-analyse-sound-from-mp3-files
- Requests for eyes-free ASR on Google code
- Another feature request on Android's issue tracker
- http://stackoverflow.com/questions/5613167/source-code-for-the-googles-voice-search-activity
- http://stackoverflow.com/questions/2319735/voice-recognition-on-android-with-recorded-sound-clip