Skip to content

[ONNX] Support fixed-capacity GroupQueryAttention cache#4534

Draft
IanWood1 wants to merge 1 commit intollvm:mainfrom
IanWood1:fix-constant-seq-length-onnx-gqa
Draft

[ONNX] Support fixed-capacity GroupQueryAttention cache#4534
IanWood1 wants to merge 1 commit intollvm:mainfrom
IanWood1:fix-constant-seq-length-onnx-gqa

Conversation

@IanWood1
Copy link
Copy Markdown
Member

@IanWood1 IanWood1 commented Apr 14, 2026

This updates the GroupQueryAttention lowering to support the past_present_share_buffer style runtime behavior, where present_key/present_value have the same fixed-capacity cache type as past_key/past_value. This intentionally drops support for the non-past_present_share_buffer behavior in this lowering.

Supporting both (choosing at runtime) requires comparing the input cache size to the output cache size to determine which mode. For dynamic seqlen, this is not possible because there is no way to get the output KV cache size.

Also, this adds explicit support for rank 2 seqlens_k which is off spec but emitted by onnx exporters.

@IanWood1 IanWood1 force-pushed the fix-constant-seq-length-onnx-gqa branch 2 times, most recently from b3e6c84 to 1c69b44 Compare April 14, 2026 19:28
@IanWood1 IanWood1 marked this pull request as ready for review April 14, 2026 19:34
@IanWood1 IanWood1 marked this pull request as draft April 27, 2026 15:21
@IanWood1 IanWood1 force-pushed the fix-constant-seq-length-onnx-gqa branch 2 times, most recently from cf09e66 to 00ecc03 Compare May 6, 2026 17:53
@IanWood1 IanWood1 changed the title [ONNX] Fix constant KV cache size [ONNX] Support fixed-capacity GroupQueryAttention cache May 6, 2026
Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>
@IanWood1 IanWood1 force-pushed the fix-constant-seq-length-onnx-gqa branch from 00ecc03 to 18d6386 Compare May 6, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant