Bug in MinimalGatedUnit #4608

bondquant · 2025-03-09T14:17:56Z

Line 725 in d59132d

x *= f

This should be 1 - f, according to the paper. Confusion arose around the effect of the "forget" gate (in LSTM and GRU papers, information is passed through when f is high, but in MGU paper it is the opposite). Variable f from the MGU paper, is effectively 1 - f in Flax (it is the portion that is contributes to short-term response, or n in Flax-speak). From the paper:

In MGU, the forget gate f_t is first generated, and the element-wise product between 1 - f_t and h_{t−1} becomes part of the new hidden state h_t. The portion of h_{t-1} that is "forgotten" (f_t h_{t−1}) is combined with x_t to produce h_bar_t, the short-term response. A portion of h_bar_t (determined again by f_t) form the second part of h_t.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in MinimalGatedUnit #4608

Bug in MinimalGatedUnit #4608

bondquant commented Mar 9, 2025

Bug in MinimalGatedUnit #4608

Bug in MinimalGatedUnit #4608

Comments

bondquant commented Mar 9, 2025