-
Notifications
You must be signed in to change notification settings - Fork 560
[BugFix]Fix precision issue for LoRA feature #4141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a precision issue with the LoRA feature. The change in vllm_ascend/lora/punica_npu.py correctly casts an input tensor to float32 to match the kernel's expectation, resolving a data type mismatch.
However, the changes across the four C++ kernel files (bgmv_expand.cpp, bgmv_shrink.cpp, sgmv_expand.cpp, sgmv_shrink.cpp) introduce a critical issue. By commenting out the #if (__CCE_AICORE__ >= 220) directives at the kernel call sites, you are making the bfloat16_t kernel calls unconditional. But the kernel declarations themselves remain inside the conditional compilation blocks. This will lead to compilation errors on any platform where __CCE_AICORE__ < 220. I have left specific comments on each file with details on how to resolve this. These issues must be addressed to avoid breaking builds for other hardware targets.
|
LGTM. This PR can fix 2 bugs:
|
…n vllm-ascend. Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>
…n vllm-ascend Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>
…n vllm-ascend. Co-authored-by: liuchenbing <[email protected]> Co-authored-by: guanyuzhu <[email protected]> vLLM version: v0.11.0 vLLM main: vllm-project/vllm signed-off-by: hukongyi <[email protected]>
vLLM version: v0.11.0
vLLM main: vllm-project/vllm
What this PR does / why we need it?
Fix the precision issue of the LoRA feature in vllm-ascend.
Does this PR introduce any user-facing change?
How was this patch tested?