You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
The code provided below is a simplified example of training a small model using the Hugging Face Trainer. The setup includes creating a dataset, initializing a model and tokenizer, and configuring the Trainer with different settings for gradient accumulation and DeepSpeed.
The training loss should remain consistent for different gradient accumulation steps, both with and without DeepSpeed enabled. However, the figure shows a divergence when DeepSpeed is enabled:
The text was updated successfully, but these errors were encountered:
System Info
Accelerate version: 1.1.0
transformers version: 4.46.2
DeepSpeed version: 0.14.4
Platform: Linux 5.15.0-101-generic #111-Ubuntu SMP x86_64 GNU/Linux
Python version: 3.10.14
PyTorch version (GPU?): 2.1.2+cu118 True
GPU type: NVIDIA A100
Who can help?
@muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The code provided below is a simplified example of training a small model using the Hugging Face Trainer. The setup includes creating a dataset, initializing a model and tokenizer, and configuring the Trainer with different settings for gradient accumulation and DeepSpeed.
Expected behavior
The training loss should remain consistent for different gradient accumulation steps, both with and without DeepSpeed enabled. However, the figure shows a divergence when DeepSpeed is enabled:
The text was updated successfully, but these errors were encountered: