Enable Triple Buffered GEMM #670

adedespirlet · 2026-01-02T16:13:04Z

This PR :

includes a working example of a Triple Buffered GEMM in Wave (tests/kernel/wave_gemm_test.py) using manual scheduling.
includes updated documentation in wave_schedule_2.py explaining the pipelining mechanics.
fixes a bug in loop_reconstruction.py where the buffer rotation logic during the kernel to epilogue transition worked for 2 stage pipelines but was incorrect for pipelines deeper than 2.

Note: I currently do not have access to the target hardware to verify correctness so I am testing it here with CI

Implementation Details
Instead of introducing an explicit multi_buffer_count argument, this PR leverages the existing scheduling infrastructure to achieve triple buffering natively.
By inserting an empty pipeline stage between the global loads and the compute stage, we increase the pipeline depth without increasing the initiation interval. This effectively extends the lifetime of the loaded data across an extra iteration, which compels the compiler's buffer analysis to allocate a 3rd buffer to prevent hazards. This approach was preferred as it aligns with the current stage centric loop construction logic, avoiding big changes required to support an explicit buffer count parameter.

loop_reconstruction.py BUG and FIX:
After the kernel yields its final iteration, the buffer list has been rotated. When the epilogue tries to extract these buffers it gets them in the rotated order. The original code then didn't account for this rotation, causing a mismatch. Solution: Applied a -1 rotation to compensate for how the buffers were organized when yielded, so the epilogue reads them in the correct order.

Next
Add logic to verify that the pipeline stages defined in the schedule do not exceed the device's shared memory capacity
Add lit_test

Changed buffer rotation from +1 to -1 in populate_epilogue_outer_vars to correctly align the usage of the allocations when transitioning from kernel to epilogue. The previous rotation only worked by coincidence for 2 buffer pipelines but failed for 3+ buffers. Signed-off-by: Aurore De Spirlet <[email protected]>

Signed-off-by: Aurore De Spirlet <[email protected]>

panditsa

Do you mind also adding a standalone example, either under 6.2_schedule.py or 6.3_schedule.py?

wave_lang/kernel/wave/schedules/gemm_two_pp_cluster.py

Signed-off-by: Aurore De Spirlet <[email protected]>

panditsa

LGTM

adedespirlet requested review from ftynse and tgymnich and removed request for ftynse and tgymnich January 2, 2026 16:14

adedespirlet force-pushed the triple_buffer_new branch 2 times, most recently from a818c28 to 3bba230 Compare January 2, 2026 16:50

Add test for a triple buffered gemm

1bf1cde

Signed-off-by: Aurore De Spirlet <[email protected]>

adedespirlet force-pushed the triple_buffer_new branch from 3bba230 to 1bf1cde Compare January 2, 2026 16:57

panditsa reviewed Jan 4, 2026

View reviewed changes

wave_lang/kernel/wave/schedules/gemm_two_pp_cluster.py Outdated Show resolved Hide resolved

adedespirlet added 2 commits January 5, 2026 10:50

add triple buffer test to 6.2_schedule.py

c259dde

Signed-off-by: Aurore De Spirlet <[email protected]>

refactor :move triple buffer schedule into separate file

0cb4aa7

Signed-off-by: Aurore De Spirlet <[email protected]>

adedespirlet force-pushed the triple_buffer_new branch from bcc15bc to 0cb4aa7 Compare January 5, 2026 18:33

fix: adding a missing synchronization barrier

0405422

Signed-off-by: Aurore De Spirlet <[email protected]>

adedespirlet force-pushed the triple_buffer_new branch from dd0a59f to 0405422 Compare January 6, 2026 13:50

panditsa approved these changes Jan 6, 2026

View reviewed changes

adedespirlet merged commit 10af117 into iree-org:main Jan 7, 2026
15 checks passed

adedespirlet deleted the triple_buffer_new branch January 7, 2026 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Triple Buffered GEMM #670

Enable Triple Buffered GEMM #670

adedespirlet commented Jan 2, 2026 •

edited

Loading

Uh oh!

panditsa left a comment

Uh oh!

Uh oh!

panditsa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable Triple Buffered GEMM #670

Enable Triple Buffered GEMM #670

Conversation

adedespirlet commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

panditsa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

panditsa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adedespirlet commented Jan 2, 2026 •

edited

Loading