Support for FP8 Matmuls #275

maktukmak · 2024-08-09T18:52:21Z

Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights are quantized to float8. Matmul is being done in full precision in that case, if I am not mistaken. What's the current situation and roadmap for using float8 matrix multiplications, for instance through _scaled_mm?

github-actions · 2024-09-09T02:00:52Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2024-10-10T02:02:43Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2024-11-10T02:04:39Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions bot added the Stale label Sep 9, 2024

dacorvo removed the Stale label Sep 9, 2024

github-actions bot added the Stale label Oct 10, 2024

dacorvo removed the Stale label Oct 10, 2024

github-actions bot added the Stale label Nov 10, 2024

dacorvo removed the Stale label Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for FP8 Matmuls #275

Support for FP8 Matmuls #275

maktukmak commented Aug 9, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Oct 10, 2024

github-actions bot commented Nov 10, 2024

Support for FP8 Matmuls #275

Support for FP8 Matmuls #275

Comments

maktukmak commented Aug 9, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Oct 10, 2024

github-actions bot commented Nov 10, 2024