You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-2
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,6 @@
28
28
- 2020/11/24 Add HiFi-GAN vocoder. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/hifigan)
29
29
- 2020/11/19 Add Multi-GPU gradient accumulator. See [here](https://github.com/TensorSpeech/TensorFlowTTS/pull/377)
30
30
- 2020/08/23 Add Parallel WaveGAN tensorflow implementation. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/parallel_wavegan)
31
-
- 2020/08/23 Add MBMelGAN G + ParallelWaveGAN G example. See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/multiband_pwgan)
32
31
- 2020/08/20 Add C++ inference code. Thank [@ZDisket](https://github.com/ZDisket). See [here](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin)
33
32
- 2020/08/18 Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file
34
33
- 2020/08/14 Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan)
@@ -227,7 +226,7 @@ To know how to train model from scratch or fine-tune with other datasets/languag
227
226
- For MelGAN + STFT Loss tutorial, pls see [examples/melgan.stft](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/melgan.stft)
228
227
- For Multiband-MelGAN tutorial, pls see [examples/multiband_melgan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan)
229
228
- For Parallel WaveGAN tutorial, pls see [examples/parallel_wavegan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/parallel_wavegan)
230
-
- For Multiband-MelGAN Generator + Parallel WaveGAN Discriminator tutorial, pls see [examples/multiband_pwgan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_pwgan)
229
+
- For Multiband-MelGAN Generator + HiFi-GAN tutorial, pls see [examples/multiband_melgan_hf](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan_hf)
231
230
- For HiFi-GAN tutorial, pls see [examples/hifigan](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/hifigan)
# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
3
+
Based on the script [`train_multiband_melgan_hf.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan_hf/train_multiband_melgan_hf.py).
4
+
5
+
## Training Multi-band MelGAN from scratch with LJSpeech dataset.
6
+
This example code show you how to train MelGAN from scratch with Tensorflow 2 based on custom training loop and tf.function. The data used for this example is LJSpeech Ultimate, you can download the dataset at [link](https://machineexperiments.tumblr.com/post/662408083204685824/ljspeech-ultimate).
7
+
8
+
### Step 1: Create Tensorflow based Dataloader (tf.dataset)
9
+
Please see detail at [examples/melgan/](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan#step-1-create-tensorflow-based-dataloader-tfdataset)
10
+
11
+
### Step 2: Training from scratch
12
+
After you re-define your dataloader, pls modify an input arguments, train_dataset and valid_dataset from [`train_multiband_melgan_hf.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan_hf/train_multiband_melgan_hf.py). Here is an example command line to training melgan-stft from scratch:
13
+
14
+
First, you need training generator with only stft loss:
IF you want to use MultiGPU to training you can replace `CUDA_VISIBLE_DEVICES=0` by `CUDA_VISIBLE_DEVICES=0,1,2,3` for example. You also need to tune the `batch_size` for each GPU (in config file) by yourself to maximize the performance. Note that MultiGPU now support for Training but not yet support for Decode.
40
+
41
+
In case you want to resume the training progress, please following below example command line:
If you want to finetune a model, use `--pretrained` like this with the filename of the generator and discriminator, separated by comma.
48
+
```bash
49
+
--pretrained ptgenerator.h5,ptdiscriminator.h5
50
+
```
51
+
It is recommended that you first train text2mel model then extract postnets so that vocoder learns to compensate for flaws, if you do so, append `--postnets 1` to arguments
52
+
53
+
54
+
55
+
**IMPORTANT NOTES**:
56
+
57
+
- If Your Dataset is 16K, upsample_scales = [2, 4, 8] worked.
58
+
- If Your Dataset is > 16K (22K, 24K, ...), upsample_scales = [2, 4, 8] didn't worked, used [8, 4, 2] instead.
59
+
- Mixed precision make Group Convolution training slower on Discriminator, both pytorch (apex) and tensorflow also has this problems. So, **DO NOT USE** mixed precision when discriminator enable.
60
+
61
+
### Step 3: Decode audio from folder mel-spectrogram
62
+
To running inference on folder mel-spectrogram (eg valid folder), run below command line:
## Finetune MelGAN STFT with ljspeech pretrained on other languages
75
+
Just load pretrained model and training from scratch with other languages. **DO NOT FORGET** re-preprocessing on your dataset if needed. A hop_size should be 512 if you want to use our pretrained.
76
+
77
+
## Learning Curves
78
+
Here is a learning curves of melgan based on this config [`multiband_melgan_hf.v1.yaml`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan_hf/conf/multiband_melgan_hf.v1.yaml)
79
+
80
+
<imgsrc="fig/eval.png"height="300"width="850">
81
+
82
+
<imgsrc="fig/train.png"height="300"width="850">
83
+
84
+
## Pretrained Models and Audio samples
85
+
| Model | Conf | Lang | Fs [Hz]| Mel range [Hz]| FFT / Hop / Win [pt]| # iters | Notes |
2.[Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480)
94
+
3.[Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106)
0 commit comments