Skip to content

Conversation

@junhaha666
Copy link
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings January 8, 2026 08:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR transitions from Triton's unified_attention to CK (Composable Kernel) MHA for handling sink and sliding window attention patterns. The changes remove temporary workarounds that were previously needed for GPT-OSS model compatibility.

Key changes:

  • Switches prefill attention to always use CK MHA (flash_attn_varlen_func) with native sink and sliding window support
  • Removes the ATOM_GPT_OSS_MODEL environment variable and related temporary workarounds
  • Eliminates fake block table generation logic that was needed for Triton's unified_attention

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
atom/utils/forward_context.py Removes GPT-OSS model environment variable usage and fake block table generation logic for prefill attention
atom/utils/envs.py Removes ATOM_GPT_OSS_MODEL environment variable definition
atom/model_ops/attentions/backends.py Removes block table preparation for sliding window during prefill phase
atom/model_ops/attention_mha.py Adds window_size and sink_ptr parameters to CK MHA call; comments out Triton prefill path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# variable lenth attention use key value as input
attn_metadata = fwd_args.attn_metadata
sliding_window = (self.sliding_window, 0, 0) if self.sliding_window is not None else (-1, -1, 0)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sliding_window logic is inconsistent with the prefill_attention_triton implementation. In prefill_attention_triton (line 373), the window_size is defined as (self.sliding_window - 1, 0), but here it's defined as (self.sliding_window, 0, 0). This inconsistency could lead to different behavior. Additionally, the condition checks if self.sliding_window is not None, but at line 64 in init, self.sliding_window is set to -1 when None is passed, so this None check will never be true. The condition should check if self.sliding_window != -1 instead.

Suggested change
sliding_window = (self.sliding_window, 0, 0) if self.sliding_window is not None else (-1, -1, 0)
sliding_window = (self.sliding_window - 1, 0) if self.sliding_window != -1 else (-1, 0)

Copilot uses AI. Check for mistakes.
Comment on lines +402 to +404
# if self.use_triton_attn:
# return self.prefill_attention_triton
# else:
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code for prefill_attention_triton creates maintainability issues. If this backend is truly no longer needed, it should be removed entirely rather than commented out. If it might be needed in the future, add a clear comment explaining why it's being kept and under what conditions it might be re-enabled. This helps prevent confusion and code rot.

Suggested change
# if self.use_triton_attn:
# return self.prefill_attention_triton
# else:
# Prefill currently always uses the ASM backend. The Triton prefill path
# (`prefill_attention_triton`) is kept defined but not dispatched until
# it is fully validated and aligned with the required kv-cache layout.

Copilot uses AI. Check for mistakes.
Comment on lines +340 to +341
window_size=sliding_window,
sink_ptr=self.sinks,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameters window_size and sink_ptr added to flash_attn_varlen_func lack inline comments explaining their purpose and expected values. Adding brief comments would help future maintainers understand how these parameters relate to the sink and sliding window attention features mentioned in the PR title.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants