[Refactor] Refactor Ascend attention implementation forward #3674

yiz-liu · 2025-10-23T09:43:28Z

What this PR does / why we need it?

This PR refactors the Ascend attention implementation to align with vLLM's core interfaces, simplifying the code and improving maintainability.

Key Changes:

Align with vLLM's Attention Interface: The forward method signature in AscendAttentionBackendImpl now matches the base AttentionImpl in vLLM, removing the custom trace_flag.
Enable Opaque Attention Operator: By adding opaque_attention_op to AscendPlatform, we allow vLLM to wrap our attention kernel in its standard vllm.unified_attention_with_output operator. This avoids the need for a custom call path.
Remove Obsolete Code:
- The custom op vllm.unified_ascend_attention_with_output has been deleted as it is now redundant.
- The trace_flag and its associated logic were removed, reducing code complexity.
- An outdated quantization branch within the attention implementation was cleaned up.
Improve Readability: Renamed output variables (output vs. intermediate_output) and added comments to clarify the in-place nature of the attention output.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

No extra tests needed.

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

gemini-code-assist

Code Review

This pull request refactors the Ascend attention implementation by removing the unified_ascend_attention_with_output custom operator and its associated graph-splitting logic, aligning the attention mechanism more closely with the upstream vLLM framework. This is a positive change for code simplification and maintainability. However, the review identifies a critical issue where support for INT8 KV cache quantization appears to have been removed without a replacement, which could be a significant regression for quantized models.

vllm_ascend/attention/attention_v1.py

github-actions · 2025-10-23T09:59:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Removes the `unified_ascend_attention_with_output` custom operator and its associated graph-splitting logic. The calling of attention implementation is now aligned to vLLM. Also removes unecessary codes in attention implementation. Signed-off-by: Yizhou Liu <[email protected]>

Removes the `trace_flag` parameter from all calls to the attention backend's `forward` method within the unit tests. The parameter is no longer used, and this change simplifies the test code by removing the redundant argument and the now-obsolete test case that specifically validated its behavior. Signed-off-by: Yizhou Liu <[email protected]>

yiz-liu · 2025-10-24T01:36:11Z

Due to CI infra update, we need to close this PR and create a new one.

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

vllm_ascend/attention/attention_v1.py Show resolved Hide resolved

yiz-liu force-pushed the refactor branch from 27f8c52 to 2d07cf6 Compare October 23, 2025 10:02

github-actions bot added module:core module:tests labels Oct 23, 2025

yiz-liu added ready read for review ready-for-test start test by label for PR and removed module:tests labels Oct 23, 2025

yiz-liu force-pushed the refactor branch from 8691462 to 2292427 Compare October 23, 2025 16:21

github-actions bot added the module:tests label Oct 23, 2025

yiz-liu force-pushed the refactor branch from 2292427 to fc3b347 Compare October 23, 2025 16:32

yiz-liu closed this Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Refactor Ascend attention implementation forward #3674

[Refactor] Refactor Ascend attention implementation forward #3674

yiz-liu commented Oct 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

yiz-liu commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Refactor] Refactor Ascend attention implementation forward #3674

[Refactor] Refactor Ascend attention implementation forward #3674

Conversation

yiz-liu commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Key Changes:

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

yiz-liu commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yiz-liu commented Oct 23, 2025 •

edited by github-actions bot

Loading