Extrinsic reward clipping

In the [RND paper](https://arxiv.org/pdf/1810.12894.pdf) on page 15, it mentions that extrinsic rewards are clipped in [-1,1].
But in the [official RND code](https://github.dev/openai/random-network-distillation) in atari_wrappers.py it clips extrinsic rewards using the _ClipRewardEnv_ function which does: 
```python
"""Bin reward to {+1, 0, -1} by its sign."""
        return float(np.sign(reward))
```

I believe the implementation and the explanation in the paper is a little different.
In your implementation (jcwleo) you are clipping by doing:
```python
        total_reward = total_reward.reshape([num_step, num_env_workers]).transpose().clip(-1, 1)
```

I believe this is different than the official implementation. Does anyone have an explanation of this discrepancy and what to use ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extrinsic reward clipping #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Extrinsic reward clipping #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions