Learning Rate Sudden Dropped to min_lr after Warm-up Steps #11149

zixianwang2022 · 2024-11-04T18:11:58Z

Describe the bug

Hi, I am observing that the learning rate suddenly dropped to model.optim.sched.min_lr after model.optim.sched.warmup_steps. I am using CosineAnnealing, where I am expecting the learning rate will gradually drop to min_lr after warmup steps instead of suddenly dropping.

Steps/Code to reproduce bug

  optim:
    name: distributed_fused_adam
    lr: 5e-6
    weight_decay: 0.01 
    betas: 
    - 0.9
    - 0.98
    sched:

      name: CosineAnnealing
      warmup_steps: 250
      constant_steps: 2500

      min_lr: 1e-7

Expected behavior

I am expecting the learning rate will gradually drop to min_lr after warmup steps instead of suddenly dropping. If I am doing it the wrong way, what should be the correct way of making this possible?

Environment overview (please complete the following information)

PyTorch version 2.3
Python version 3.10

The text was updated successfully, but these errors were encountered:

csn1011 · 2024-11-09T23:40:03Z

Looking herehttps://github.com/NVIDIA/NeMo/blob/main/nemo/core/optim/lr_scheduler.py#L353 you may need to set decay_steps (If I'm looking in the correct place). It looks like during the warmup_steps the learning rate linearly ramps to max_lr, then decays to min_lr during decay_steps. Curious if that works for you!

zixianwang2022 added the bug Something isn't working label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning Rate Sudden Dropped to min_lr after Warm-up Steps #11149

Learning Rate Sudden Dropped to min_lr after Warm-up Steps #11149

zixianwang2022 commented Nov 4, 2024

csn1011 commented Nov 9, 2024

Learning Rate Sudden Dropped to min_lr after Warm-up Steps #11149

Learning Rate Sudden Dropped to min_lr after Warm-up Steps #11149

Comments

zixianwang2022 commented Nov 4, 2024

csn1011 commented Nov 9, 2024