Are I-quants still slower on Metal as the Feature Matrix wiki says? #9821

daaain · 2024-10-10T11:04:34Z

daaain
Oct 10, 2024

In a quant testing post the author linked the Feature Matrix wiki page.

It says that the I-quants are slow on Metal and CPU, but I've just done a bit of testing between mradermacher/Qwen2.5-Coder-7B-Instruct-GGUF/Qwen2.5-Coder-7B-Instruct.Q4_K_S.gguf and mradermacher/Qwen2.5-Coder-7B-Instruct-i1-GGUF/Qwen2.5-Coder-7B-Instruct.i1-IQ4_XS.gguf and the speed difference seemed statistically insignificant. I used mmap and full 32K context, but haven't used Flash attention as I wasn't sure if it's properly supported.

Does anyone know if the slowness was fixed or if the difference wasn't ever significant?

The wiki doesn't link sources so not really sure what to expect.

ggerganov · 2024-10-10T11:44:19Z

ggerganov
Oct 10, 2024
Maintainer

FA should work with all quants. There haven't been any major updates to the Metal and CPU kernels recently. I'm not sure how relevant the information in the Feature Matrix is today.

2 replies

daaain Oct 10, 2024
Author

Thanks for the quick reply!

The wiki page was last updated in March, I guess it would be better to delete it if the info isn't up to date to prevent confusing people?

So you think there shouldn't be a performace difference using imatrix quants?

slaren Oct 10, 2024
Maintainer

You can try running test-backend-ops perf to get an overview of the matrix multiplication performance with each type. In my machine the i-quants don't seem to be significantly slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are I-quants still slower on Metal as the Feature Matrix wiki says? #9821

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Are I-quants still slower on Metal as the Feature Matrix wiki says? #9821

daaain Oct 10, 2024

Replies: 1 comment · 2 replies

ggerganov Oct 10, 2024 Maintainer

daaain Oct 10, 2024 Author

slaren Oct 10, 2024 Maintainer

daaain
Oct 10, 2024

Replies: 1 comment 2 replies

ggerganov
Oct 10, 2024
Maintainer

daaain Oct 10, 2024
Author

slaren Oct 10, 2024
Maintainer