Skip to content

Commit d6d92b8

Browse files
Minor README changes
1 parent 4a58f0f commit d6d92b8

File tree

1 file changed

+21
-3
lines changed

1 file changed

+21
-3
lines changed

README.md

+21-3
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
AutoSub is a CLI application to generate subtitle file (.srt) for any video file using [Mozilla DeepSpeech](https://github.com/mozilla/DeepSpeech). I use the DeepSpeech Python API to run inference on audio segments and [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) to split the initial audio on silent segments, producing multiple small files.
1616

17+
1718
## Motivation
1819

1920
In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.
@@ -43,6 +44,12 @@ In the age of OTT platforms, there are still some who prefer to download movies/
4344
```bash
4445
$ mkdir audio output
4546
```
47+
* Install FFMPEG. If you're running Ubuntu, this should work fine.
48+
```bash
49+
$ sudo apt-get install ffmpeg
50+
$ ffmpeg -version # I'm running 4.1.4
51+
```
52+
4653

4754
## How-to example
4855

@@ -52,19 +59,30 @@ In the age of OTT platforms, there are still some who prefer to download movies/
5259
```
5360
* After the script finishes, the SRT file is saved in `output/`
5461
* Open the video file and add this SRT file as a subtitle, or you can just drag and drop in VLC.
55-
* When I tested the script on my laptop, it took about 40 minutes to generate the SRT file for a 70 minutes video file.
62+
5663

5764
## How it works
5865

66+
Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you first run the script, I use FFMPEG to **extract the audio** from the video and save it in `audio/`. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate.
67+
68+
Then, I use [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in `autosub/featureExtraction.py` and `autosub/trainAudio.py` All these audio files are stored in `audio/`. Then for each audio segment, I perform DeepSpeech inference on it, and write the inferred text in a SRT file. After all files are processed, the final SRT file is stored in `output/`.
69+
70+
When I tested the script on my laptop, it took about **40 minutes to generate the SRT file for a 70 minutes video file**. My config is an i5 dual-core @ 2.5 Ghz and 8 gigs of RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file.
71+
5972
6073
## TO-DO
6174
75+
* Pre-process inferred text before writing to file (prettify)
76+
* Add progress bar to extract_audio()
77+
* GUI support (?)
78+
6279
6380
## Contributing
6481
82+
I would love to follow up on any suggestions/issues you find :)
83+
6584
6685
## References
6786
1. https://github.com/mozilla/DeepSpeech/
6887
2. https://github.com/tyiannak/pyAudioAnalysis
69-
3. https://deepspeech.readthedocs.io/
70-
4.
88+
3. https://deepspeech.readthedocs.io/

0 commit comments

Comments
 (0)