-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
I have a few questions regarding the training setup in the repository.
1.Stage 5 in run.shand pretrained checkpoints
In Stage 5 of the run.sh script, I noticed that train.py is called with pretrained checkpoints for the LLM, Flow, and HiFi-GAN models using the argument:
--checkpoint $pretrained_model_dir/$model.pt
Does this mean that the current setup is intended for fine-tuning / post-training rather than training the models from scratch?
2.Training Zero-shot LM & CFM
In the paper, Figure 2(b) mentions Large-scale Pretraining and obtaining Zero-shot LM and CFM. If I want to do the same that is train it from scratch:
-
Should I remove the
--checkpointargument as mentioned regarding cosyvoice base model training from scratch #1060 fromtrain.pywhen:model = llmmodel = flowmodel = hifigan
-
Are there any additional modifications required in the training scripts or configuration files to properly train the models from scratch apart from changing the llm and flow training scheduler to warmuplr as mentioned in How to obtain Zero-shot LM & CFM? #1820 ?
3. How to train CosyVoice 3 with Differentiable reward optimization (DiffRO)
4. Vocoder used in CosyVoice 3
For the results reported in the paper:
- Which vocoder was used for waveform generation in CosyVoice 3?
- Was the vocoder trained from scratch, or was a pretrained model used?
- What is
hift.ptgiven here?