Support for gradient accumulation #9 #30

fluoryyn-art · 2025-12-08T08:21:51Z

Added gradient accumulation parameter to argument.py with default value = 1 (maintain backward compatibility)
Updated training function in utils.py to handle gradient accumulation by:
- scale the loss by 1/gradient_accumulation_steps to maintain same effective learning rate
- only perform optimizer steps after accumulating gradients from the specified number of steps
- handle final steps to ensure gradients are applied even if steps don't divide evenly
modified run.py to update deepspeed configuration with gradient accumulation steps

fluoryyn-art · 2025-12-08T08:22:21Z

Support for gradient accumulation codefuse-ai#9

85bc1e3

fluoryyn-art mentioned this pull request Dec 8, 2025

[Codefuse开源轻训营] Support for gradient accumulation #9

Open

remove some files

33af82c

Provide feedback