Fix gumbel #4

alexander-telepov · 2022-04-18T13:03:53Z

There are two commits, fixing corresponding issues:

Gumbels shoul be summed with log-probablities of actions instead of probabilities inside gumbel-softmax trick;
Non-existing attachment sites should have zero probablity to be choosen by network, i.e minus infinity log-probabilities.

alexander-telepov · 2022-04-18T14:16:01Z

g_ratio was changed from 1e-3 to 1.0 after replacement of probabilities to log-probabilities as input to gumbel_softmax function because I find out that in "freed pe" experiment after random exploration phase actor stacked in state, in which it sample only one action. In such state magnitude of gumbels multiplied by g_ratio was much smaller than log-probabilities, and actor was actually determenistic. After I changed g_ratio to 1.0 (in which function takes form same as original gumbel-softmax formula) actor doesn't stack in such states anymore.

alexander-telepov added 2 commits April 18, 2022 15:36

fix gumbel-softmax sampling

ee3ca19

fix log_prob padding

80f1edb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix gumbel #4

Fix gumbel #4

Uh oh!

alexander-telepov commented Apr 18, 2022

Uh oh!

alexander-telepov commented Apr 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix gumbel #4

Are you sure you want to change the base?

Fix gumbel #4

Uh oh!

Conversation

alexander-telepov commented Apr 18, 2022

Uh oh!

alexander-telepov commented Apr 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant