About truncated normal distribution based weight initialization #1284
Answered
by
rwightman
developer0hye
asked this question in
Q&A
-
Hi, @rwightman Thanks for sharing your works! I have a question. Why do you trunc initialized weight values? Do you use it for numerical stability? |
Beta Was this translation helpful? Give feedback.
Answered by
rwightman
May 28, 2022
Replies: 1 comment 1 reply
-
@developer0hye moving to discussions, it was to attempt to match initialization for some networks implemented in JAX and Tensorflow (as that is a more common default layer init there) |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
developer0hye
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@developer0hye moving to discussions, it was to attempt to match initialization for some networks implemented in JAX and Tensorflow (as that is a more common default layer init there)