[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler #3675

whx-sjtu · 2025-10-23T10:06:34Z

This PR fix the bug related with running multi-modal models with AscendScheduler. This bug was introduced by PR #2372 by using the same parameter names as vLLM with different default values. The error is as following:

(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] WorkerProc failed to start.
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] Traceback (most recent call last):
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.worker.load_model()
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/worker/worker_v1.py", line 307, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.model_runner.load_model()
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/worker/model_runner_v1.py", line 2656, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.model = get_model(vllm_config=self.vllm_config)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/__init__.py", line 119, in get_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return loader.load_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/models/qwen2_5_vl.py", line 513, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     super().__init__(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/models/qwen2_5_vl.py", line 1023, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     vllm_config = vllm_config.with_hf_config(hf_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/config/__init__.py", line 300, in with_hf_config
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return replace(self, model_config=model_config)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.10/lib/python3.11/dataclasses.py", line 1503, in replace
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return obj.__class__(**changes)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.10/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] scheduler_config
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   Value error, max_long_partial_prefills (2147483647) must be greater than or equal to 1 and less than or equal to max_num_partial_prefills (1). [type=value_error, input_value=AscendSchedulerConfig(run..., decode_max_num_seqs=0), input_type=AscendSchedulerConfig]

Currently I fix this bug by changing the default values of these two parameters to align with vLLM. Please take a look: @Csrayz @frankie-ys @xueliangyang-oeuler @wangxiyuan

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@17c540a

gemini-code-assist

Code Review

This PR fixes a bug when running multi-modal models with AscendScheduler. The fix involves renaming configuration parameters in AscendSchedulerConfig to avoid conflicts with the base SchedulerConfig from vLLM. This is a good approach. The implementation looks correct, but I found that one of the new parameter names is misleading, which could lead to confusion and incorrect usage. I've provided a suggestion to improve the naming for better clarity and maintainability.

vllm_ascend/core/schedule_config.py

github-actions · 2025-10-23T10:19:57Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-10-24T01:22:43Z

@Csrayz can you take a look at this change?

whx-sjtu · 2025-10-24T02:26:01Z

@Csrayz can you take a look at this change?

Today I found that the meaning of these two parameters is aligned with vLLM. The problem lies in default values. We should just align the default valuse with vLLM. cc @Csrayz @wangxiyuan

Signed-off-by: hw_whx <[email protected]>

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

vllm_ascend/core/schedule_config.py Outdated Show resolved Hide resolved

github-actions bot added the documentation Improvements or additions to documentation label Oct 23, 2025

whx-sjtu force-pushed the fix_ascend_scheduler branch from 86263c5 to 0349604 Compare October 24, 2025 02:21

github-actions bot removed the documentation Improvements or additions to documentation label Oct 24, 2025

whx-sjtu force-pushed the fix_ascend_scheduler branch 2 times, most recently from 95cd053 to ffee8b2 Compare October 24, 2025 06:19

github-actions bot added the module:tests label Oct 24, 2025

wangxiyuan approved these changes Oct 24, 2025

View reviewed changes

wangxiyuan added the ready read for review label Oct 24, 2025

whx-sjtu added the ready-for-test start test by label for PR label Oct 24, 2025

modify default values to fix bug

2148652

Signed-off-by: hw_whx <[email protected]>

whx-sjtu force-pushed the fix_ascend_scheduler branch from 79ab771 to 2148652 Compare October 24, 2025 16:08

wangxiyuan merged commit e33751e into vllm-project:main Oct 25, 2025
26 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler #3675

[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler #3675

whx-sjtu commented Oct 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

wangxiyuan commented Oct 24, 2025

Uh oh!

whx-sjtu commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler #3675

[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler #3675

Conversation

whx-sjtu commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

wangxiyuan commented Oct 24, 2025

Uh oh!

whx-sjtu commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

whx-sjtu commented Oct 23, 2025 •

edited by github-actions bot

Loading

whx-sjtu commented Oct 24, 2025 •

edited

Loading