Skip to content

Conversation

@adedespirlet
Copy link
Contributor

@adedespirlet adedespirlet commented Jan 2, 2026

This PR :

  • includes a working example of a Triple Buffered GEMM in Wave (tests/kernel/wave_gemm_test.py) using manual scheduling.
  • includes updated documentation in wave_schedule_2.py explaining the pipelining mechanics.
  • fixes a bug in loop_reconstruction.py where the buffer rotation logic during the kernel to epilogue transition worked for 2 stage pipelines but was incorrect for pipelines deeper than 2.

Note: I currently do not have access to the target hardware to verify correctness so I am testing it here with CI

Implementation Details
Instead of introducing an explicit multi_buffer_count argument, this PR leverages the existing scheduling infrastructure to achieve triple buffering natively.
By inserting an empty pipeline stage between the global loads and the compute stage, we increase the pipeline depth without increasing the initiation interval. This effectively extends the lifetime of the loaded data across an extra iteration, which compels the compiler's buffer analysis to allocate a 3rd buffer to prevent hazards. This approach was preferred as it aligns with the current stage centric loop construction logic, avoiding big changes required to support an explicit buffer count parameter.

loop_reconstruction.py BUG and FIX:
After the kernel yields its final iteration, the buffer list has been rotated. When the epilogue tries to extract these buffers it gets them in the rotated order. The original code then didn't account for this rotation, causing a mismatch. Solution: Applied a -1 rotation to compensate for how the buffers were organized when yielded, so the epilogue reads them in the correct order.

Next
Add logic to verify that the pipeline stages defined in the schedule do not exceed the device's shared memory capacity
Add lit_test

Changed buffer rotation from +1 to -1 in populate_epilogue_outer_vars
to correctly align the usage of the allocations when transitioning from
kernel to epilogue. The previous rotation only worked by coincidence
for 2 buffer pipelines but failed for 3+ buffers.

Signed-off-by: Aurore De Spirlet <[email protected]>
@adedespirlet adedespirlet requested review from ftynse and tgymnich and removed request for ftynse and tgymnich January 2, 2026 16:14
@adedespirlet adedespirlet force-pushed the triple_buffer_new branch 2 times, most recently from a818c28 to 3bba230 Compare January 2, 2026 16:50
Signed-off-by: Aurore De Spirlet <[email protected]>
Copy link
Contributor

@panditsa panditsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind also adding a standalone example, either under 6.2_schedule.py or 6.3_schedule.py?

Copy link
Contributor

@panditsa panditsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adedespirlet adedespirlet merged commit 10af117 into iree-org:main Jan 7, 2026
15 checks passed
@adedespirlet adedespirlet deleted the triple_buffer_new branch January 7, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants