Trained model can generate correct text but incorrect speech

I tried to reproduce the training of the fr-en simultaneous model. I follows the instruction to prepare the dataset and run the script train.simul-s2st.sh
The model training seems to go fine but the during evaluation of our trained model (using ./simuleval.simul-s2st.sh), weird behaviors happen.
Here is the training logging:
<img width="1185" alt="Screenshot 2024-07-27 at 2 39 39 AM" src="https://github.com/user-attachments/assets/3e6ee51b-1603-4068-9ad6-6ba4bc2d36bb">
During the inference, when I tried to run the eval scripts on the example you provided, the weird thing happens, it can output correct text translation but the output speech is incorrect (output speech is almost silent). I print the text output and speech units output as follow:  
<img width="1133" alt="image" src="https://github.com/user-attachments/assets/b1ae3e70-0a18-4c1c-adec-9521f9e29dc3">

Do you know what problem may be?

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trained model can generate correct text but incorrect speech #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trained model can generate correct text but incorrect speech #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions