FlexAttention? #1685

johnnynunez · 2025-02-10T01:27:00Z

Is it compatible flexattention from pytorch 2.6.0?

jainapurva · 2025-02-10T02:02:56Z

@drisspg

drisspg · 2025-02-10T17:41:11Z

Can you add some more context here @johnnynunez

johnnynunez · 2025-02-10T19:34:51Z

Can you add some more context here @johnnynunez

I want to quantize the lerobot pizero model, which has FlexAttention. @drisspg
https://huggingface.co/blog/pi0

context:
In the future, we plan on extending this support to allow for quantized versions of attention or things like RadixAttention as well.

https://pytorch.org/blog/flexattention/

drisspg · 2025-02-10T20:23:31Z

So currently all of our quantization APIs target linear layers and are orthogonal to flex attention. Therefore, yes, flex attention should work. Flex-Tension currently doesn't support low precision inputs, however, that is planned - no ETA yet though

johnnynunez · 2025-02-10T20:26:13Z

So currently all of our quantization APIs target linear layers and are orthogonal to flex attention. Therefore, yes, flex attention should work. Flex-Tension currently doesn't support low precision inputs, however, that is planned - no ETA yet though

thanks! I'm going to try

drisspg · 2025-02-10T20:43:41Z

Let me know if anything comes up!

moinnadeem · 2025-02-17T07:05:19Z

Flex-Tension currently doesn't support low precision inputs, however, that is planned - no ETA yet though

@drisspg this is a good point -- what will happen with a low-precision input? will it get upcast to bf16 for the actual matmul? if so, are you basically seeing VRAM savings but no time savings?

drisspg · 2025-02-18T04:53:13Z

I have an example for doing this and @danielvegamyhre is starting to investigate and ultimately make this a well supported path.

For an fp8 mm the mm will be in low precision and in theory can utilize tensors cores w/ fp8 support on H100 and accumulated into high

johnnynunez mentioned this issue Feb 20, 2025

is flex attetion compatible? vllm-project/llm-compressor#1180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlexAttention? #1685

FlexAttention? #1685

johnnynunez commented Feb 10, 2025

jainapurva commented Feb 10, 2025

drisspg commented Feb 10, 2025

johnnynunez commented Feb 10, 2025 •

edited

Loading

drisspg commented Feb 10, 2025

johnnynunez commented Feb 10, 2025

drisspg commented Feb 10, 2025

moinnadeem commented Feb 17, 2025

drisspg commented Feb 18, 2025

FlexAttention? #1685

FlexAttention? #1685

Comments

johnnynunez commented Feb 10, 2025

jainapurva commented Feb 10, 2025

drisspg commented Feb 10, 2025

johnnynunez commented Feb 10, 2025 • edited Loading

drisspg commented Feb 10, 2025

johnnynunez commented Feb 10, 2025

drisspg commented Feb 10, 2025

moinnadeem commented Feb 17, 2025

drisspg commented Feb 18, 2025

johnnynunez commented Feb 10, 2025 •

edited

Loading