Skip to content

Conversation

@alexander-telepov
Copy link

There are two commits, fixing corresponding issues:

  • Gumbels shoul be summed with log-probablities of actions instead of probabilities inside gumbel-softmax trick;
  • Non-existing attachment sites should have zero probablity to be choosen by network, i.e minus infinity log-probabilities.

@alexander-telepov
Copy link
Author

g_ratio was changed from 1e-3 to 1.0 after replacement of probabilities to log-probabilities as input to gumbel_softmax function because I find out that in "freed pe" experiment after random exploration phase actor stacked in state, in which it sample only one action. In such state magnitude of gumbels multiplied by g_ratio was much smaller than log-probabilities, and actor was actually determenistic. After I changed g_ratio to 1.0 (in which function takes form same as original gumbel-softmax formula) actor doesn't stack in such states anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant