-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Issues: NVIDIA/NeMo
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Off By One Error When Checkpointing and Old Checkpoints Getting Deleted During Run
bug
Something isn't working
#12284
opened Feb 20, 2025 by
aflah02
Exported Llama Models Trained Using NeMo Generate The Same Token Repeatedly
bug
Something isn't working
#12212
opened Feb 17, 2025 by
aflah02
loss divergence when CP>1 and MBS>1
bug
Something isn't working
#12210
opened Feb 17, 2025 by
hawkoli1987
Pre-Training Neva under pipeline parallel set to 2.
bug
Something isn't working
#12205
opened Feb 16, 2025 by
takuya576
Support configuration of num_workers and max_samples_per_sequence in llava_next_pretrain
#12195
opened Feb 14, 2025 by
bernardhan33
HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess."
bug
Something isn't working
#12178
opened Feb 13, 2025 by
Fournogo
Update TE version for support of Something isn't working
pad_between_seqs=True
bug
#12174
opened Feb 13, 2025 by
cyanguwa
I am trying to train the FastConformer 120M model from scratch, but it is not converging?
#12167
opened Feb 13, 2025 by
PhamDangNguyen
Error in saving nemo checkpoint with Llama-3.1-70B SFT. /opt/NeMo/nemo/utils/callbacks/nemo_model_checkpoint.py
bug
Something isn't working
#12157
opened Feb 12, 2025 by
songwang41
[HELP] Run into the NaN grad problem while going through the exmaple of official document with fp16
bug
Something isn't working
#12134
opened Feb 11, 2025 by
twotwoiscute
Fail to convert trained checkpoint to HF format
bug
Something isn't working
#12124
opened Feb 10, 2025 by
Zhihan1996
Loss Fails to Converge in Nemo2-sft.ipynb with Precision 16
#12102
opened Feb 8, 2025 by
twotwoiscute
ASR Lhotse dataloader : TypeError: object of type 'IterableDatasetWrapper' has no len()
bug
Something isn't working
#12093
opened Feb 7, 2025 by
AudranBert
AttributeError: 'HFDatasetDataModule' object has no attribute 'tokenizer'
bug
Something isn't working
#12080
opened Feb 6, 2025 by
j40903272
extra_loggers is not used to log metrics or hyperparameters
bug
Something isn't working
#12046
opened Feb 4, 2025 by
chajath
llava-like dataset implementation "LazySupervisedDataset" likely fails to handle large dataset
#12034
opened Feb 3, 2025 by
bernardhan33
Previous Next
ProTip!
Updated in the last three days: updated:>2025-02-17.