About truncated normal distribution based weight initialization #1284

developer0hye · 2022-05-28T05:36:43Z

Thanks for sharing your works!

I have a question.

Why do you trunc initialized weight values?

Do you use it for numerical stability?

May 28, 2022

@developer0hye moving to discussions, it was to attempt to match initialization for some networks implemented in JAX and Tensorflow (as that is a more common default layer init there)

View full answer

rwightman · 2022-05-28T22:04:33Z

rwightman
May 28, 2022
Maintainer

@developer0hye moving to discussions, it was to attempt to match initialization for some networks implemented in JAX and Tensorflow (as that is a more common default layer init there)

1 reply

developer0hye May 29, 2022
Author

@rwightman
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

About truncated normal distribution based weight initialization #1284

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

About truncated normal distribution based weight initialization #1284

Uh oh!

Uh oh!

developer0hye May 28, 2022

Replies: 1 comment · 1 reply

Uh oh!

rwightman May 28, 2022 Maintainer

Uh oh!

developer0hye May 29, 2022 Author

developer0hye
May 28, 2022

Replies: 1 comment 1 reply

rwightman
May 28, 2022
Maintainer

developer0hye May 29, 2022
Author