[long_seq_Feat] support chunk prefill #4158

LookAround0301 · 2025-11-12T12:46:21Z

What this PR does / why we need it?

1、qwen GQA attention_v1 optim
2、DeepSeek MLA refactor, all gather q -> all gather kv
3、modelrunner refactor for chunk prefill, we remove some code not use

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-12T12:46:31Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the chunked prefill implementation for MLA attention to better support context parallelism (PCP/DCP). The changes simplify the logic in _compute_prefill_context by introducing a new helper function _reorg_kvcache and unifying the handling of different parallelism configurations. While the refactoring in vllm_ascend/attention/mla_v1.py is a significant improvement in code clarity and maintainability, I've identified a critical bug in vllm_ascend/worker/model_runner_v1.py related to the calculation of context length for speculative decoding when context parallelism is enabled. This could lead to incorrect attention computations.

vllm_ascend/worker/model_runner_v1.py

Signed-off-by: LookAround <[email protected]>

Signed-off-by: Delphine-Nic <[email protected]>

wangxiyuan · 2025-11-13T08:35:46Z

I'll enable full test once lint pass

Signed-off-by: LookAround <[email protected]>

Signed-off-by: Delphine-Nic <[email protected]>

Signed-off-by: LookAround <[email protected]>

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

vllm_ascend/worker/model_runner_v1.py Show resolved Hide resolved

LookAround0301 force-pushed the chunk_prefill branch 2 times, most recently from a16bf4a to 212c346 Compare November 12, 2025 13:20

github-actions bot added the module:tests label Nov 12, 2025

LookAround0301 force-pushed the chunk_prefill branch from bab356f to eab493d Compare November 12, 2025 13:25

support chunk prefill base version

63bc686

Signed-off-by: LookAround <[email protected]>

LookAround0301 force-pushed the chunk_prefill branch from 1e3ff6e to 63bc686 Compare November 13, 2025 03:26

[Performance Optimization] GQA chunk prefill with pcp and dcp

befae2a

Signed-off-by: Delphine-Nic <[email protected]>

wangxiyuan added the ready read for review label Nov 13, 2025

clean code

998d62a

Signed-off-by: LookAround <[email protected]>

LookAround0301 force-pushed the chunk_prefill branch from 841b6a3 to 998d62a Compare November 13, 2025 08:41

Delphine-Nic and others added 2 commits November 13, 2025 16:56

cleancode

3c8b68c

Signed-off-by: Delphine-Nic <[email protected]>

clean code

c52f187

Signed-off-by: LookAround <[email protected]>

wangxiyuan added the ready-for-test start test by label for PR label Nov 13, 2025

clean code

4033d0f

Signed-off-by: LookAround <[email protected]>

wangxiyuan approved these changes Nov 14, 2025

View reviewed changes

wangxiyuan merged commit 5ec96fd into vllm-project:main Nov 14, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[long_seq_Feat] support chunk prefill #4158

[long_seq_Feat] support chunk prefill #4158

LookAround0301 commented Nov 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

wangxiyuan commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[long_seq_Feat] support chunk prefill #4158

[long_seq_Feat] support chunk prefill #4158

Conversation

LookAround0301 commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wangxiyuan commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LookAround0301 commented Nov 12, 2025 •

edited by github-actions bot

Loading