[BugFix]Fix precision issue for LoRA feature #4141

hukongyi · 2025-11-12T04:05:16Z

vLLM version: v0.11.0
vLLM main: vllm-project/vllm

What this PR does / why we need it?

Fix the precision issue of the LoRA feature in vllm-ascend.

Does this PR introduce any user-facing change?

How was this patch tested?

pytest tests/lora/test_llama_tp.py::test_llama_lora -s

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-12T04:05:25Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to fix a precision issue with the LoRA feature. The change in vllm_ascend/lora/punica_npu.py correctly casts an input tensor to float32 to match the kernel's expectation, resolving a data type mismatch.

However, the changes across the four C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical issue. By commenting out the #if (__CCE_AICORE__ >= 220) directives at the kernel call sites, you are making the bfloat16_t kernel calls unconditional. But the kernel declarations themselves remain inside the conditional compilation blocks. This will lead to compilation errors on any platform where __CCE_AICORE__ < 220. I have left specific comments on each file with details on how to resolve this. These issues must be addressed to avoid breaking builds for other hardware targets.

csrc/kernels/bgmv_expand.cpp

csrc/kernels/bgmv_shrink.cpp

csrc/kernels/sgmv_expand.cpp

csrc/kernels/sgmv_shrink.cpp

paulyu12 · 2025-11-13T06:10:29Z

LGTM. This PR can fix 2 bugs:

The accuracy issue when we add Llama-2-7b-hf LoRA e2e testcase.
LoRA custom operators do not support dtype bfloat16, which is also mentioned at [Bug]: LoRA not working in v0.11.0rc0 #3668 (comment)

…n vllm-ascend. Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>

…n vllm-ascend Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>

…n vllm-ascend. Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

csrc/kernels/bgmv_expand.cpp Outdated Show resolved Hide resolved

csrc/kernels/bgmv_shrink.cpp Outdated Show resolved Hide resolved

csrc/kernels/sgmv_expand.cpp Outdated Show resolved Hide resolved

csrc/kernels/sgmv_shrink.cpp Outdated Show resolved Hide resolved

paulyu12 mentioned this pull request Nov 13, 2025

[BugFix]This PR aims to fix the precision issue of the LoRA feature i… #4046

Closed

hukongyi force-pushed the lora_fix branch from 95f9f7e to 25534b7 Compare November 13, 2025 03:23

paulyu12 added ready read for review ready-for-test start test by label for PR labels Nov 13, 2025

hukongyi and others added 3 commits November 13, 2025 19:07

[BugFix]This PR aims to fix the precision issue of the LoRA feature i…

d9e618f

…n vllm-ascend. Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>

[BugFix]This PR aims to fix the precision issue of the LoRA feature i…

473fbf8

…n vllm-ascend Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>

[BugFix]This PR aims to fix the precision issue of the LoRA feature i…

5c4a97c

…n vllm-ascend. Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>

hukongyi force-pushed the lora_fix branch from 12fbd58 to 5c4a97c Compare November 13, 2025 11:07

paulyu12 added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix]Fix precision issue for LoRA feature #4141

[BugFix]Fix precision issue for LoRA feature #4141

hukongyi commented Nov 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paulyu12 commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BugFix]Fix precision issue for LoRA feature #4141

Are you sure you want to change the base?

[BugFix]Fix precision issue for LoRA feature #4141

Conversation

hukongyi commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paulyu12 commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hukongyi commented Nov 12, 2025 •

edited by github-actions bot

Loading