You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why 4-bit activations are not supported in quanto ?
Activations are quantized dynamically based on the recording of scales during calibration (unlike weights that are quantized statically), adding an extra cost to the inference.
To make it worth it, we need to benefit from an accelerated matmul with the quantized weights.
Unfortunately, at this stage the only accelerated operations available are for scalar quantization scales, that give terrible results with 4-bit weights (you need group-wise scales to preserve accuracy).
How could you still use 4-bit activations ?
You would need to modify some code here and there:
As mentioned in title.
The text was updated successfully, but these errors were encountered: