-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
I noticed that you don't cancel gradient of the large values, when using straight through estimator here.
In QNN paper it was claimed "Not cancelling the gradient when r is too large significantly worsens performance".
Does it only matter for low precision quantization (e.g. binary?)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels