Clarification on run.sh script (Stage 5), DiffRO training & Vocoder in CosyVoice 3

I have a few questions regarding the training setup in the repository.

**1.Stage 5 in `run.sh`and pretrained checkpoints**

In [Stage 5](https://github.com/FunAudioLLM/CosyVoice/blob/04bcadc6340e266b4ba09f4474b4668c444aa063/examples/libritts/cosyvoice3/run.sh#L64) of the [`run.sh`](https://github.com/FunAudioLLM/CosyVoice/blob/main/examples/libritts/cosyvoice3/run.sh) script, I noticed that `train.py` is called with pretrained checkpoints for the LLM, Flow, and HiFi-GAN models using the argument:

[`--checkpoint $pretrained_model_dir/$model.pt`](https://github.com/FunAudioLLM/CosyVoice/blob/04bcadc6340e266b4ba09f4474b4668c444aa063/examples/libritts/cosyvoice3/run.sh#L82)

Does this mean that the current setup is intended for fine-tuning / post-training rather than training the models from scratch?

**2.Training Zero-shot LM & CFM**

In the [paper](https://arxiv.org/pdf/2505.17589), Figure 2(b) mentions Large-scale Pretraining  and obtaining Zero-shot LM and CFM. If I want to do the same that is train it from scratch:

  - Should I remove the `--checkpoint` argument as mentioned #1060  from `train.py` when:

     - `model = llm`
     - `model = flow`
     - `model = hifigan`

  - Are there any additional modifications required in the training scripts or configuration files to properly train the models from scratch apart from changing the llm and flow training scheduler to warmuplr as mentioned in #1820 ?

**3. How to train CosyVoice 3 with Differentiable reward optimization (DiffRO)**

**4. Vocoder used in CosyVoice 3**

For the results reported in the [paper](https://arxiv.org/pdf/2505.17589):

- Which vocoder was used for waveform generation in CosyVoice 3?
- Was the vocoder trained from scratch, or was a pretrained model used?
- What is `hift.pt` given [here](https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512/blob/main/hift.pt)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on run.sh script (Stage 5), DiffRO training & Vocoder in CosyVoice 3 #1845

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on run.sh script (Stage 5), DiffRO training & Vocoder in CosyVoice 3 #1845

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions