Skip to content

Conversation

@fluoryyn-art
Copy link

  1. Added gradient accumulation parameter to argument.py with default value = 1 (maintain backward compatibility)

  2. Updated training function in utils.py to handle gradient accumulation by:

    • scale the loss by 1/gradient_accumulation_steps to maintain same effective learning rate
    • only perform optimizer steps after accumulating gradients from the specified number of steps
    • handle final steps to ensure gradients are applied even if steps don't divide evenly
  3. modified run.py to update deepspeed configuration with gradient accumulation steps

@fluoryyn-art
Copy link
Author

#9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant