You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+21-3
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@
14
14
15
15
AutoSub is a CLI application to generate subtitle file (.srt) for any video file using [Mozilla DeepSpeech](https://github.com/mozilla/DeepSpeech). I use the DeepSpeech Python API to run inference on audio segments and [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) to split the initial audio on silent segments, producing multiple small files.
16
16
17
+
17
18
## Motivation
18
19
19
20
In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.
@@ -43,6 +44,12 @@ In the age of OTT platforms, there are still some who prefer to download movies/
43
44
```bash
44
45
$ mkdir audio output
45
46
```
47
+
* Install FFMPEG. If you're running Ubuntu, this should work fine.
48
+
```bash
49
+
$ sudo apt-get install ffmpeg
50
+
$ ffmpeg -version # I'm running 4.1.4
51
+
```
52
+
46
53
47
54
## How-to example
48
55
@@ -52,19 +59,30 @@ In the age of OTT platforms, there are still some who prefer to download movies/
52
59
```
53
60
* After the script finishes, the SRT file is saved in`output/`
54
61
* Open the video file and add this SRT file as a subtitle, or you can just drag and drop in VLC.
55
-
* When I tested the script on my laptop, it took about 40 minutes to generate the SRT file for a 70 minutes video file.
62
+
56
63
57
64
## How it works
58
65
66
+
Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support forfine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you first run the script, I use FFMPEG to **extract the audio** from the video and save itin`audio/`. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate.
67
+
68
+
Then, I use [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) forsilence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resultingin smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in`autosub/featureExtraction.py` and `autosub/trainAudio.py` All these audio files are stored in`audio/`. Then foreach audio segment, I perform DeepSpeech inference on it, and write the inferred textin a SRT file. After all files are processed, the final SRT file is stored in`output/`.
69
+
70
+
When I tested the script on my laptop, it took about **40 minutes to generate the SRT file for a 70 minutes video file**. My config is an i5 dual-core @ 2.5 Ghz and 8 gigs of RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file.
71
+
59
72
60
73
## TO-DO
61
74
75
+
* Pre-process inferred text before writing to file (prettify)
76
+
* Add progress bar to extract_audio()
77
+
* GUI support (?)
78
+
62
79
63
80
## Contributing
64
81
82
+
I would love to follow up on any suggestions/issues you find :)
0 commit comments