-
Notifications
You must be signed in to change notification settings - Fork 303
Bounded Action Space #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bounded Action Space #81
Conversation
… naturally bounded action space.
Hi Antoine, |
If you'd like to help add this feature, I think it would make sense to have a general distribution class through which beta or gaussian essentially becomes different options users can configure. Eventually we want to add support for categorical distribution and other types as well. |
Hello, have you tested this modification, I try to train it with my env but it fails |
Hey! Yes we use it to train all our robots at the University of Luxembourg! Is it crashing? Or just not training? |
And that's the Beta or the Squashed Gaussian? |
It`s the Squashed Gaussian. Besides, I have also tried Beta, but sadly none of them work for my case😭 |
Any chances it's just not outputting values in the range that make sense for you? I would recommend looking at the std_dev values |
Okay, thank you for your advice Antoine, I would try to debug and find the problem |
Hi, Antoine, I was wondering how is the performance of Beta and Squashed on your task. I increased |
Hi there!
This PR adds support for bounded action spaces directly into the agent.
The main difference with clipping, is that this ensures actions are sampled within a fixed range and rewards on actions will not be computed on clipped actions.
To accomodate this, two options are provided:
To allow for the smooth calculation of the KL distance between two beta distribution, I had to slightly rework the transition to store the distribution parameters rather than just the std and the mean. Hence in the case of the normal distribution, I save mean + std_dev, while for the beta distribution alpha and beta.
Then instead of manually computing the KL distance, I let torch do the heavy lifting.
Configuration wise it could look like this:
Beta
Normal
I know, this changes significantly the way PPO updates are done, and it's a BREAKING CHANGE, so no I totally understand if the beta policy doesn't make it to main repo! Though having a reliable action clipping mechanism would be nice :).
LMK if you want me to change anything, I'd be happy to!
Best,
Antoine