Skip to content

Comments

fix division by zero in anneal_beta when total_timesteps < batch_size#489

Open
valtterivalo wants to merge 1 commit intoPufferAI:4.0from
valtterivalo:fix/anneal-beta-div-zero
Open

fix division by zero in anneal_beta when total_timesteps < batch_size#489
valtterivalo wants to merge 1 commit intoPufferAI:4.0from
valtterivalo:fix/anneal-beta-div-zero

Conversation

@valtterivalo
Copy link

@valtterivalo valtterivalo commented Feb 17, 2026

total_epochs = total_timesteps / batch_size is integer division, so when total_timesteps < batch_size (e.g. short smoke tests with 50k steps), it evaluates to 0. this causes anneal_beta to compute 0/0 = NaN, which propagates through mb_prio into the PPO loss.

cosine_annealing() already handles this case (if (T == 0) return lr_base) but anneal_beta didn't have the same guard.

fix: clamp total_epochs to at least 1.

When total_timesteps is smaller than one batch (total_agents * horizon),
total_epochs = total_timesteps / batch_size evaluates to 0. The
anneal_beta formula then computes current_epoch / total_epochs = 0 / 0,
which is NaN in IEEE 754. This NaN propagates through priority replay
weights (mb_prio) into the loss, silently producing NaN for all losses.

cosine_annealing() already guards against this (line 714: if T == 0
return lr_base), but the anneal_beta computation on the next line does
not. Clamping total_epochs to at least 1 fixes both.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant