Training of environment Mjlab-Velocity-Rough-Unitree-Go1 crashes because of abnormal reward values

Hello,

Training in the built-in environment Mjlab-Velocity-Rough-Unitree-Go1 crashes with 
```
Traceback (most recent call last):
  File "/home/chengrui/Workspaces/mjlab/src/mjlab/scripts/train.py", line 255, in <module>
    main()
    ~~~~^^
  File "/home/chengrui/Workspaces/mjlab/src/mjlab/scripts/train.py", line 251, in main
    launch_training(task_id=chosen_task, args=args)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chengrui/Workspaces/mjlab/src/mjlab/scripts/train.py", line 202, in launch_training
    run_train(task_id, args, log_dir)
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chengrui/Workspaces/mjlab/src/mjlab/scripts/train.py", line 172, in run_train
    runner.learn(
    ~~~~~~~~~~~~^
      num_learning_iterations=cfg.agent.max_iterations, init_at_random_ep_len=True
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/chengrui/Libraries/miniforge3/envs/mjlab/lib/python3.13/site-packages/rsl_rl/runners/on_policy_runner.py", line 102, in learn
    loss_dict = self.alg.update()
  File "/home/chengrui/Libraries/miniforge3/envs/mjlab/lib/python3.13/site-packages/rsl_rl/algorithms/ppo.py", line 259, in update
    self.actor(obs_batch, masks=masks_batch, hidden_state=hidden_states_batch[0], stochastic_output=True)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chengrui/Libraries/miniforge3/envs/mjlab/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/chengrui/Libraries/miniforge3/envs/mjlab/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/chengrui/Libraries/miniforge3/envs/mjlab/lib/python3.13/site-packages/rsl_rl/models/mlp_model.py", line 122, in forward
    return self.distribution.sample()
           ~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/chengrui/Libraries/miniforge3/envs/mjlab/lib/python3.13/site-packages/torch/distributions/normal.py", line 81, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: normal expects all elements of std >= 0.0
```

This might be because abnormal reward values make training unstable.

<img width="620" height="311" alt="Image" src="https://github.com/user-attachments/assets/ca99202e-f8fd-4bae-90ba-26b55b817e6b" />

<img width="522" height="314" alt="Image" src="https://github.com/user-attachments/assets/40cc27db-2cc1-4fde-a785-0dfc22d9e500" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training of environment Mjlab-Velocity-Rough-Unitree-Go1 crashes because of abnormal reward values #738

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training of environment Mjlab-Velocity-Rough-Unitree-Go1 crashes because of abnormal reward values #738

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions