-
Notifications
You must be signed in to change notification settings - Fork 562
[WIP] Update newest #4142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] Update newest #4142
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the codebase to support a newer version of the vLLM library. The changes primarily involve adapting to moved utility functions and API modifications. While the changes appear functional, there's a significant amount of code duplication introduced for handling version-dependent imports (e.g., cdiv, init_cached_hf_modules) across numerous files. I strongly recommend refactoring this logic into a centralized compatibility utility module to enhance maintainability. Additionally, there's duplicated logic in vllm_ascend/models/qwen2_5_vl.py for processing visual inputs that could be extracted into a helper method.
| from vllm_ascend.utils import vllm_version_is | ||
|
|
||
| if vllm_version_is("0.11.0"): | ||
| from vllm.utils import cdiv | ||
| else: | ||
| from vllm.utils.math_utils import cdiv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional import logic for cdiv is repeated in many other files (attention/mla_v1.py, core/scheduler.py, distributed/mooncake/config_data.py, patch/platform/patch_mamba_config.py, spec_decode/mtp_proposer.py, torchair/torchair_attention.py, torchair/torchair_mla.py, torchair/torchair_sfa.py, worker/block_table.py, worker/model_runner_v1.py). A similar pattern exists for init_cached_hf_modules in worker/worker_v1.py and its test file. This widespread duplication makes the code harder to maintain and prone to errors if future updates are needed.
To improve this, I suggest creating a central compatibility utility module (e.g., vllm_ascend/utils/compat.py) to house all such version-dependent imports. Then, other files can import cdiv, init_cached_hf_modules, etc., directly from this new module, centralizing the version-checking logic.
| if vllm_version_is("0.11.0"): | ||
| image_embeds = self.visual(pixel_values, grid_thw=grid_thw) | ||
| else: | ||
| with set_ascend_forward_context(None, self.vllm_config): | ||
| image_embeds = self.visual(pixel_values, grid_thw=grid_thw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else block to handle different vLLM versions is duplicated in _process_video_input (lines 561-567). This duplicated logic should be extracted into a private helper method to improve code clarity and maintainability. For example, you could create a _run_visual method:
def _run_visual(self, pixel_values, grid_thw):
if vllm_version_is("0.11.0"):
return self.visual(pixel_values, grid_thw=grid_thw)
else:
with set_ascend_forward_context(None, self.vllm_config):
return self.visual(pixel_values, grid_thw=grid_thw)Then you can call this helper in both _process_image_input and _process_video_input.
image_embeds = self._run_visual(pixel_values, grid_thw=grid_thw)|
|
||
| from vllm.v1.kv_cache_interface import FullAttentionSpec, MambaSpec | ||
|
|
||
| from vllm_ascend.utils import vllm_version_is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
e0ab650 to
8b7a437
Compare
8b7a437 to
ae349dc
Compare
bb66034 to
18b71a3
Compare
f2c1b1e to
4be60d9
Compare
8e5cfbf to
3cb2af9
Compare
e07fff6 to
48de24c
Compare
48de24c to
839dd89
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
3ac8c18 to
5348f4d
Compare
5348f4d to
8c90757
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
…tructured outputs compatibility#26866 Signed-off-by: leo-pony <[email protected]>
Signed-off-by: 22dimensions <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: 22dimensions <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: 22dimensions <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: leo-pony <[email protected]>
63c97bd to
26a80b4
Compare
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?