use ck mha instead of triton unified_attention for sink and window #118

junhaha666 · 2026-01-08T08:59:14Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR transitions from Triton's unified_attention to CK (Composable Kernel) MHA for handling sink and sliding window attention patterns. The changes remove temporary workarounds that were previously needed for GPT-OSS model compatibility.

Key changes:

Switches prefill attention to always use CK MHA (flash_attn_varlen_func) with native sink and sliding window support
Removes the ATOM_GPT_OSS_MODEL environment variable and related temporary workarounds
Eliminates fake block table generation logic that was needed for Triton's unified_attention

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
atom/utils/forward_context.py	Removes GPT-OSS model environment variable usage and fake block table generation logic for prefill attention
atom/utils/envs.py	Removes ATOM_GPT_OSS_MODEL environment variable definition
atom/model_ops/attentions/backends.py	Removes block table preparation for sliding window during prefill phase
atom/model_ops/attention_mha.py	Adds window_size and sink_ptr parameters to CK MHA call; comments out Triton prefill path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T09:03:56Z

atom/model_ops/attention_mha.py


        # variable lenth attention use key value as input
        attn_metadata = fwd_args.attn_metadata
+        sliding_window = (self.sliding_window, 0, 0) if self.sliding_window is not None else (-1, -1, 0)


The sliding_window logic is inconsistent with the prefill_attention_triton implementation. In prefill_attention_triton (line 373), the window_size is defined as (self.sliding_window - 1, 0), but here it's defined as (self.sliding_window, 0, 0). This inconsistency could lead to different behavior. Additionally, the condition checks if self.sliding_window is not None, but at line 64 in init, self.sliding_window is set to -1 when None is passed, so this None check will never be true. The condition should check if self.sliding_window != -1 instead.

Suggested change

sliding_window = (self.sliding_window, 0, 0) if self.sliding_window is not None else (-1, -1, 0)

sliding_window = (self.sliding_window - 1, 0) if self.sliding_window != -1 else (-1, 0)

Copilot · 2026-01-08T09:03:56Z

atom/model_ops/attention_mha.py

+            # if self.use_triton_attn:
+            #     return self.prefill_attention_triton
+            # else:


The commented-out code for prefill_attention_triton creates maintainability issues. If this backend is truly no longer needed, it should be removed entirely rather than commented out. If it might be needed in the future, add a clear comment explaining why it's being kept and under what conditions it might be re-enabled. This helps prevent confusion and code rot.

Suggested change

# if self.use_triton_attn:

# return self.prefill_attention_triton

# else:

# Prefill currently always uses the ASM backend. The Triton prefill path

# (`prefill_attention_triton`) is kept defined but not dispatched until

# it is fully validated and aligned with the required kv-cache layout.

Copilot · 2026-01-08T09:03:56Z

atom/model_ops/attention_mha.py

+            window_size=sliding_window,
+            sink_ptr=self.sinks,


The new parameters window_size and sink_ptr added to flash_attn_varlen_func lack inline comments explaining their purpose and expected values. Adding brief comments would help future maintainers understand how these parameters relate to the sink and sliding window attention features mentioned in the PR title.

use ck mha instead of triton unified_attention for sink and window

c3482a0

Copilot AI review requested due to automatic review settings January 8, 2026 08:59

Copilot started reviewing on behalf of junhaha666 January 8, 2026 09:00 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use ck mha instead of triton unified_attention for sink and window #118

use ck mha instead of triton unified_attention for sink and window #118

junhaha666 commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	sliding_window = (self.sliding_window, 0, 0) if self.sliding_window is not None else (-1, -1, 0)
	sliding_window = (self.sliding_window - 1, 0) if self.sliding_window != -1 else (-1, 0)

-            # if self.use_triton_attn:
-            #     return self.prefill_attention_triton
-            # else:
+            # Prefill currently always uses the ASM backend. The Triton prefill path
+            # (`prefill_attention_triton`) is kept defined but not dispatched until
+            # it is fully validated and aligned with the required kv-cache layout.

use ck mha instead of triton unified_attention for sink and window #118

Are you sure you want to change the base?

use ck mha instead of triton unified_attention for sink and window #118

Conversation

junhaha666 commented Jan 8, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants