Skip to content

Conversation

@hukongyi
Copy link

@hukongyi hukongyi commented Nov 12, 2025

vLLM version: v0.11.0
vLLM main: vllm-project/vllm

What this PR does / why we need it?

Fix the precision issue of the LoRA feature in vllm-ascend.

Does this PR introduce any user-facing change?

How was this patch tested?

pytest tests/lora/test_llama_tp.py::test_llama_lora -s
lora_test

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a precision issue with the LoRA feature. The change in vllm_ascend/lora/punica_npu.py correctly casts an input tensor to float32 to match the kernel's expectation, resolving a data type mismatch.

However, the changes across the four C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical issue. By commenting out the #if (__CCE_AICORE__ >= 220) directives at the kernel call sites, you are making the bfloat16_t kernel calls unconditional. But the kernel declarations themselves remain inside the conditional compilation blocks. This will lead to compilation errors on any platform where __CCE_AICORE__ < 220. I have left specific comments on each file with details on how to resolve this. These issues must be addressed to avoid breaking builds for other hardware targets.

@paulyu12
Copy link
Collaborator

LGTM. This PR can fix 2 bugs:

@paulyu12 paulyu12 added ready read for review ready-for-test start test by label for PR labels Nov 13, 2025
hukongyi and others added 3 commits November 13, 2025 19:07
…n vllm-ascend.

Co-authored-by: liuchenbing <[email protected]>
Co-authored-by: guanyuzhu <[email protected]>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
signed-off-by: hukongyi <[email protected]>
…n vllm-ascend

Co-authored-by: liuchenbing <[email protected]>
Co-authored-by: guanyuzhu <[email protected]>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
signed-off-by: hukongyi <[email protected]>
…n vllm-ascend.

Co-authored-by: liuchenbing <[email protected]>
Co-authored-by: guanyuzhu <[email protected]>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
signed-off-by: hukongyi <[email protected]>
@paulyu12 paulyu12 added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants