Skip to content

Commit 542eda0

Browse files
Refactor main.py, add utils.py, and update README
1 parent 33fe2bc commit 542eda0

File tree

7 files changed

+190
-231
lines changed

7 files changed

+190
-231
lines changed

README.md

+36-53
Original file line numberDiff line numberDiff line change
@@ -1,124 +1,107 @@
11
# AutoSub
22

3-
- [About](#about)
4-
- [Motivation](#motivation)
5-
- [Installation](#installation)
6-
- [Docker](#docker)
7-
- [How-to example](#how-to-example)
8-
- [How it works](#how-it-works)
9-
- [TO-DO](#to-do)
10-
- [Contributing](#contributing)
11-
- [References](#references)
3+
- [AutoSub](#autosub)
4+
- [About](#about)
5+
- [Installation](#installation)
6+
- [Docker](#docker)
7+
- [How-to example](#how-to-example)
8+
- [How it works](#how-it-works)
9+
- [Motivation](#motivation)
10+
- [Contributing](#contributing)
11+
- [References](#references)
1212

1313
## About
1414

1515
AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using [Mozilla DeepSpeech](https://github.com/mozilla/DeepSpeech). I use the DeepSpeech Python API to run inference on audio segments and [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) to split the initial audio on silent segments, producing multiple small files.
1616

1717
⭐ Featured in [DeepSpeech Examples](https://github.com/mozilla/DeepSpeech-examples) by Mozilla
1818

19-
## Motivation
20-
21-
In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.
22-
23-
2419
## Installation
2520

26-
* Clone the repo. All further steps should be performed while in the `AutoSub/` directory
21+
* Clone the repo
2722
```bash
2823
$ git clone https://github.com/abhirooptalasila/AutoSub
2924
$ cd AutoSub
3025
```
31-
* Create a pip virtual environment to install the required packages
26+
* Create a virtual environment to install the required packages. All further steps should be performed while in the `AutoSub/` directory
3227
```bash
33-
$ python3 -m venv sub
28+
$ python3 -m pip install --user virtualenv
29+
$ virtualenv sub
3430
$ source sub/bin/activate
35-
$ pip3 install -r requirements.txt
3631
```
37-
* Download the model and scorer files from DeepSpeech repo. The scorer file is optional, but it greatly improves inference results.
32+
* Use the corresponding requirements file depending on whether you have a GPU or not. Make sure you have the appropriate [CUDA](https://deepspeech.readthedocs.io/en/v0.9.3/USING.html#cuda-dependency-inference) version
3833
```bash
39-
# Model file (~190 MB)
40-
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
41-
# Scorer file (~950 MB)
42-
$ wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
34+
$ pip3 install -r requirements.txt
35+
OR
36+
$ pip3 install -r requirements-gpu.txt
4337
```
44-
* Create two folders `audio/` and `output/` to store audio segments and final SRT and VTT file
38+
* Use `getmodels.sh` to download the model and scorer files with the version number as argument
4539
```bash
46-
$ mkdir audio output
40+
$ ./getmodels.sh 0.9.3
4741
```
48-
* Install FFMPEG. If you're running Ubuntu, this should work fine.
42+
* Install FFMPEG. If you're on Ubuntu, this should work fine
4943
```bash
5044
$ sudo apt-get install ffmpeg
5145
$ ffmpeg -version # I'm running 4.1.4
5246
```
53-
54-
* [OPTIONAL] If you would like the subtitles to be generated faster, you can use the GPU package instead. Make sure to install the appropriate [CUDA](https://deepspeech.readthedocs.io/en/v0.9.3/USING.html#cuda-dependency-inference) version.
55-
```bash
56-
$ source sub/bin/activate
57-
$ pip3 install deepspeech-gpu
58-
```
5947
6048
6149
## Docker
6250
63-
* Installation using Docker is pretty straight-forward.
64-
* First start by downloading training models by specifying which version you want:
65-
* if you have your own, then skip this step and just ensure they are placed in project directory with .pbmm and .scorer extensions
51+
* If you don't have the model files, get them
6652
```bash
67-
$ ./getmodel.sh 0.9.3
53+
$ ./getmodels.sh 0.9.3
6854
```
69-
70-
* Then for a CPU build, run:
55+
* For a CPU build
7156
```bash
7257
$ docker build -t autosub .
7358
$ docker run --volume=`pwd`/input:/input --name autosub autosub --file /input/video.mp4
7459
$ docker cp autosub:/output/ .
7560
```
76-
77-
* For a GPU build that is reusable (saving time on instantiating the program):
61+
* For a GPU build that is reusable (saving time on instantiating the program)
7862
```bash
7963
$ docker build --build-arg BASEIMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --build-arg DEPSLIST=requirements-gpu.txt -t autosub-base . && \
8064
docker run --gpus all --name autosub-base autosub-base --dry-run || \
8165
docker commit --change 'CMD []' autosub-base autosub-instance
8266
```
83-
* Then
67+
* Finally
8468
```bash
85-
$ docker run --volume=`pwd`/input:/input --name autosub autosub-instance --file video.mp4
69+
$ docker run --volume=`pwd`/input:/input --name autosub autosub-instance --file ~/video.mp4
8670
$ docker cp autosub:/output/ .
8771
```
8872
8973
## How-to example
9074
91-
* Make sure the model and scorer files are in the root directory. They are automatically loaded
92-
* After following the installation instructions, you can run `autosub/main.py` as given below. The `--file` argument is the video file for which SRT file is to be generated
75+
* The model files should be in the repo root directory and will be loaded automatically. But incase you have multiple versions, use the `--model` and `--scorer` args while executing
76+
* After following the installation instructions, you can run `autosub/main.py` as given below. The `--file` argument is the video file for which subtitles are to be generated
9377
```bash
9478
$ python3 autosub/main.py --file ~/movie.mp4
9579
```
9680
* After the script finishes, the SRT file is saved in `output/`
97-
* Open the video file and add this SRT file as a subtitle, or you can just drag and drop in VLC.
9881
* The optional `--split-duration` argument allows customization of the maximum number of seconds any given subtitle is displayed for. The default is 5 seconds
9982
```bash
10083
$ python3 autosub/main.py --file ~/movie.mp4 --split-duration 8
10184
```
102-
* By default, AutoSub outputs in a number of formats. To only produce the file formats you want use the `--format` argument:
85+
* By default, AutoSub outputs SRT, VTT and TXT files. To only produce the file formats you want, use the `--format` argument
10386
```bash
10487
$ python3 autosub/main.py --file ~/movie.mp4 --format srt txt
10588
```
89+
* Open the video file and add this SRT file as a subtitle. You can just drag and drop in VLC.
90+
10691
10792
10893
## How it works
10994
110-
Mozilla DeepSpeech is an amazing open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you first run the script, I use FFMPEG to **extract the audio** from the video and save it in `audio/`. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate.
95+
Mozilla DeepSpeech is an open-source speech-to-text engine with support for fine-tuning using custom datasets, external language models, exporting memory-mapped models and a lot more. You should definitely check it out for STT tasks. So, when you run the script, I use FFMPEG to **extract the audio** from the video and save it in `audio/`. By default DeepSpeech is configured to accept 16kHz audio samples for inference, hence while extracting I make FFMPEG use 16kHz sampling rate.
11196
112-
Then, I use [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in `autosub/featureExtraction.py` and `autosub/trainAudio.py` All these audio files are stored in `audio/`. Then for each audio segment, I perform DeepSpeech inference on it, and write the inferred text in a SRT file. After all files are processed, the final SRT file is stored in `output/`.
97+
Then, I use [pyAudioAnalysis](https://github.com/tyiannak/pyAudioAnalysis) for silence removal - which basically takes the large audio file initially extracted, and splits it wherever silent regions are encountered, resulting in smaller audio segments which are much easier to process. I haven't used the whole library, instead I've integrated parts of it in `autosub/featureExtraction.py` and `autosub/trainAudio.py`. All these audio files are stored in `audio/`. Then for each audio segment, I perform DeepSpeech inference on it, and write the inferred text in a SRT file. After all files are processed, the final SRT file is stored in `output/`.
11398
114-
When I tested the script on my laptop, it took about **40 minutes to generate the SRT file for a 70 minutes video file**. My config is an i5 dual-core @ 2.5 Ghz and 8 gigs of RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file.
99+
When I tested the script on my laptop, it took about **40 minutes to generate the SRT file for a 70 minutes video file**. My config is an i5 dual-core @ 2.5 Ghz and 8GB RAM. Ideally, the whole process shouldn't take more than 60% of the duration of original video file.
115100
116101
117-
## TO-DO
102+
## Motivation
118103
119-
* Pre-process inferred text before writing to file (prettify)
120-
* Add progress bar to `extract_audio()`
121-
* GUI support (?)
104+
In the age of OTT platforms, there are still some who prefer to download movies/videos from YouTube/Facebook or even torrents rather than stream. I am one of them and on one such occasion, I couldn't find the subtitle file for a particular movie I had downloaded. Then the idea for AutoSub struck me and since I had worked with DeepSpeech previously, I decided to use it.
122105
123106
124107
## Contributing

0 commit comments

Comments
 (0)