fix division by zero in anneal_beta when total_timesteps < batch_size by valtterivalo · Pull Request #489 · PufferAI/PufferLib

valtterivalo · 2026-02-17T20:16:47Z

total_epochs = total_timesteps / batch_size is integer division, so when total_timesteps < batch_size (e.g. short smoke tests with 50k steps), it evaluates to 0. this causes anneal_beta to compute 0/0 = NaN, which propagates through mb_prio into the PPO loss.

cosine_annealing() already handles this case (if (T == 0) return lr_base) but anneal_beta didn't have the same guard.

fix: clamp total_epochs to at least 1.

When total_timesteps is smaller than one batch (total_agents * horizon), total_epochs = total_timesteps / batch_size evaluates to 0. The anneal_beta formula then computes current_epoch / total_epochs = 0 / 0, which is NaN in IEEE 754. This NaN propagates through priority replay weights (mb_prio) into the loss, silently producing NaN for all losses. cosine_annealing() already guards against this (line 714: if T == 0 return lr_base), but the anneal_beta computation on the next line does not. Clamping total_epochs to at least 1 fixes both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix division by zero in anneal_beta when total_timesteps < batch_size#489

fix division by zero in anneal_beta when total_timesteps < batch_size#489
valtterivalo wants to merge 1 commit intoPufferAI:4.0from
valtterivalo:fix/anneal-beta-div-zero

valtterivalo commented Feb 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

valtterivalo commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

valtterivalo commented Feb 17, 2026 •

edited

Loading