Skip to content

Makes logic in optimizers use 1-based steps. #1118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ruomingp
Copy link
Contributor

Resolves inconsistent definitions of "step" between SpmdTrainer and optimizers.

Background

SpmdTrainer.step is used for summaries and checkpoints and defined as:

  • step=0: the state after initialization. A checkpoint and summaries may be saved at step 0 to reflect the initial state before the first training step;
  • step=n: summaries during the n'th training step and checkpoint (if any) saved after n steps

In optimizers, the use of steps are inconsistent. Most use 0-based steps, where the first training step uses the schedule value computed from count=0:

  • scale_by_schedule (used for learning rate schedules)
  • add_decayed_weights
  • param_ema (different from ema)
  • skip_and_clip_by_global_norm uses 0-based steps to compute the drop_norm threshold
  • decay_bias_correction assumes a 0-based step

Resolution

After discussion, we decided to change the optimizer logic to be consistent with SpmdTrainer.step. Specifically, we change decay_bias_correction, adafactor_decay_rate, scale_by_schedule, add_decayed_weights, param_ema, and skip_and_clip_by_global_norm to assume that the first step is 1.

This will make optimizer steps to be consistent with summary steps, e.g., the summary at step N will reflect the learning rate schedule computed for step N.

@ruomingp ruomingp requested review from markblee and a team as code owners April 19, 2025 16:12
@ruomingp ruomingp marked this pull request as draft April 19, 2025 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant