-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
Hi!
I have a question related to how the intrinsic rewards are calculated.
Why do you use the sum(1) instead of mean(1)?
intrinsic_reward = (target_next_feature - predict_next_feature).pow(2).sum(1) / 2 |
That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.
At the original release with tensorflow, they use reduce_mean, and im a little bit confused.
https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241
Hope you could clear me,
Thank you in advance
cangozpi
Metadata
Metadata
Assignees
Labels
No labels