RuntimeError: normal expects all elements of std >= 0.0 due to unknown cause

 ## Bug Report: `RuntimeError: normal expects all elements of std >= 0.0` during PPO update on quadruped robot

### Description
Training crashes during PPO update with `RuntimeError: normal expects all elements of std >= 0.0` in the policy's distribution sampling. Despite enabling `nan_guard`, no NaN dump is generated (no `/tmp/mjlab/nan_dumps` directory created), suggesting the invalid values are negative or zero std rather than NaN/Inf.

### Robot & Training Configuration
| Parameter | Value |
|-----------|-------|
| Robot Type | Quadruped (wheeled-legged, 四轮足) |
| Max Velocity | lin 2 m/s , ang 3rad/s |
| Terrain | Flat ground (平地) |
| Algorithm | PPO |

### Error Trace
```python
Traceback (most recent call last):
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 256, in <module>
    main()
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 252, in main
    launch_training(task_id=chosen_task, args=args)
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 203, in launch_training
    run_train(task_id, args, log_dir)
  File "/home/me/unitree_rl/mjlab/src/mjlab/scripts/train.py", line 173, in run_train
    runner.learn(
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/runners/on_policy_runner.py", line 108, in learn
    loss_dict = self.alg.update()
                ^^^^^^^^^^^^^^^^^
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/algorithms/ppo.py", line 256, in update
    self.actor(
  File ".../torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/models/mlp_model.py", line 106, in forward
    return self.distribution.sample()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/me/unitree_rl/rsl_rl/rsl_rl/modules/distribution.py", line 180, in sample
    return self._distribution.sample()  # type: ignore
  File ".../torch/distributions/normal.py", line 81, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: normal expects all elements of std >= 0.0
```

### Key Observations
1. **Crash location**: `Normal.sample()` receives `std <= 0` (likely from policy `log_std` exp or softplus)
2. **nan_guard ineffective**: No NaN dump produced → values are not NaN/Inf, but **negative or zero std**
3. **Potential trigger**: `action_clip` is set to `None`, removing bounds protection that might prevent extreme policy outputs

### Environment
| Item | Version |
|------|---------|
| OS | Ubuntu 20.04 |
| Python | 3.11.14 |
| PyTorch | (conda env: rl_mjlab) |
| Library | mjlab + rsl_rl |

### Steps to Reproduce
1. Configure quadruped robot with max velocity 2 m/s on flat terrain
2. Set `action_clip: None` (disable action clipping)
3. Enable `nan_guard` in config
4. Start training with PPO
5. Wait for crash during `runner.learn()` → `alg.update()`

### Actual Behavior
- Training crashes with `std <= 0` 
- `nan_guard` silent (no dump directory created)
- Crash occurs mid-training (not at initialization, random crash)

### Suggested Investigation
1. **Root cause**: Why does `log_std` collapse to `-inf` or very negative values?
   - Missing `action_clip` and `obs_clip `allows unbounded actions → unstable gradients?

2. **Defensive programming**: Clamp std to `min=1e-6` before sampling as safeguard

---


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: normal expects all elements of std >= 0.0 due to unknown cause #765

Bug Report: `RuntimeError: normal expects all elements of std >= 0.0` during PPO update on quadruped robot

Description

Robot & Training Configuration

Error Trace

Key Observations

Environment

Steps to Reproduce

Actual Behavior

Suggested Investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parameter	Value
Robot Type	Quadruped (wheeled-legged, 四轮足)
Max Velocity	lin 2 m/s , ang 3rad/s
Terrain	Flat ground (平地)
Algorithm	PPO

Item	Version
OS	Ubuntu 20.04
Python	3.11.14
PyTorch	(conda env: rl_mjlab)
Library	mjlab + rsl_rl

RuntimeError: normal expects all elements of std >= 0.0 due to unknown cause #765

Description

Bug Report: RuntimeError: normal expects all elements of std >= 0.0 during PPO update on quadruped robot

Description

Robot & Training Configuration

Error Trace

Key Observations

Environment

Steps to Reproduce

Actual Behavior

Suggested Investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug Report: `RuntimeError: normal expects all elements of std >= 0.0` during PPO update on quadruped robot