-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for FP8 Matmuls #275
Comments
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights are quantized to float8. Matmul is being done in full precision in that case, if I am not mistaken. What's the current situation and roadmap for using float8 matrix multiplications, for instance through _scaled_mm?
The text was updated successfully, but these errors were encountered: