Skip to content

RuntimeError: normal expects all elements of std >= 0.0 due to unknown cause #765

@NaCl-1374

Description

@NaCl-1374

Bug Report: RuntimeError: normal expects all elements of std >= 0.0 during PPO update on quadruped robot

Description

Training crashes during PPO update with RuntimeError: normal expects all elements of std >= 0.0 in the policy's distribution sampling. Despite enabling nan_guard, no NaN dump is generated (no /tmp/mjlab/nan_dumps directory created), suggesting the invalid values are negative or zero std rather than NaN/Inf.

Robot & Training Configuration

Parameter Value
Robot Type Quadruped (wheeled-legged, 四轮足)
Max Velocity lin 2 m/s , ang 3rad/s
Terrain Flat ground (平地)
Algorithm PPO

Error Trace

Traceback (most recent call last):
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 256, in <module>
    main()
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 252, in main
    launch_training(task_id=chosen_task, args=args)
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 203, in launch_training
    run_train(task_id, args, log_dir)
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 173, in run_train
    runner.learn(
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/runners/on_policy_runner.py", line 108, in learn
    loss_dict = self.alg.update()
                ^^^^^^^^^^^^^^^^^
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/algorithms/ppo.py", line 256, in update
    self.actor(
  File ".../torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/models/mlp_model.py", line 106, in forward
    return self.distribution.sample()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/modules/distribution.py", line 180, in sample
    return self._distribution.sample()  # type: ignore
  File ".../torch/distributions/normal.py", line 81, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: normal expects all elements of std >= 0.0

Key Observations

  1. Crash location: Normal.sample() receives std <= 0 (likely from policy log_std exp or softplus)
  2. nan_guard ineffective: No NaN dump produced → values are not NaN/Inf, but negative or zero std
  3. Potential trigger: action_clip is set to None, removing bounds protection that might prevent extreme policy outputs

Environment

Item Version
OS Ubuntu 20.04
Python 3.11.14
PyTorch (conda env: rl_mjlab)
Library mjlab + rsl_rl

Steps to Reproduce

  1. Configure quadruped robot with max velocity 2 m/s on flat terrain
  2. Set action_clip: None (disable action clipping)
  3. Enable nan_guard in config
  4. Start training with PPO
  5. Wait for crash during runner.learn()alg.update()

Actual Behavior

  • Training crashes with std <= 0
  • nan_guard silent (no dump directory created)
  • Crash occurs mid-training (not at initialization, random crash)

Suggested Investigation

  1. Root cause: Why does log_std collapse to -inf or very negative values?

    • Missing action_clip and obs_clip allows unbounded actions → unstable gradients?
  2. Defensive programming: Clamp std to min=1e-6 before sampling as safeguard


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions