[ONNX] Support fixed-capacity GroupQueryAttention cache by IanWood1 · Pull Request #4534 · llvm/torch-mlir

IanWood1 · 2026-04-14T15:47:43Z

This updates the GroupQueryAttention lowering to support the past_present_share_buffer style runtime behavior, where present_key/present_value have the same fixed-capacity cache type as past_key/past_value. This intentionally drops support for the non-past_present_share_buffer behavior in this lowering.

Supporting both (choosing at runtime) requires comparing the input cache size to the output cache size to determine which mode. For dynamic seqlen, this is not possible because there is no way to get the output KV cache size.

Also, this adds explicit support for rank 2 seqlens_k which is off spec but emitted by onnx exporters.

Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>

IanWood1 force-pushed the fix-constant-seq-length-onnx-gqa branch 2 times, most recently from b3e6c84 to 1c69b44 Compare April 14, 2026 19:28

IanWood1 marked this pull request as ready for review April 14, 2026 19:34

IanWood1 marked this pull request as draft April 27, 2026 15:21

IanWood1 force-pushed the fix-constant-seq-length-onnx-gqa branch 2 times, most recently from cf09e66 to 00ecc03 Compare May 6, 2026 17:53

IanWood1 changed the title ~~[ONNX] Fix constant KV cache size~~ [ONNX] Support fixed-capacity GroupQueryAttention cache May 6, 2026

[ONNX] Support fixed-capacity GroupQueryAttention cache

18d6386

Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>

IanWood1 force-pushed the fix-constant-seq-length-onnx-gqa branch from 00ecc03 to 18d6386 Compare May 6, 2026 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Support fixed-capacity GroupQueryAttention cache#4534

[ONNX] Support fixed-capacity GroupQueryAttention cache#4534
IanWood1 wants to merge 1 commit intollvm:mainfrom
IanWood1:fix-constant-seq-length-onnx-gqa

IanWood1 commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

IanWood1 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

IanWood1 commented Apr 14, 2026 •

edited

Loading