Skip to content

Conversation

@whx-sjtu
Copy link
Collaborator

@whx-sjtu whx-sjtu commented Oct 23, 2025

This PR fix the bug related with running multi-modal models with AscendScheduler. This bug was introduced by PR #2372 by using the same parameter names as vLLM with different default values. The error is as following:

(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] WorkerProc failed to start.
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] Traceback (most recent call last):
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.worker.load_model()
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/worker/worker_v1.py", line 307, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.model_runner.load_model()
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/worker/model_runner_v1.py", line 2656, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.model = get_model(vllm_config=self.vllm_config)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/__init__.py", line 119, in get_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return loader.load_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/models/qwen2_5_vl.py", line 513, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     super().__init__(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/models/qwen2_5_vl.py", line 1023, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     vllm_config = vllm_config.with_hf_config(hf_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/config/__init__.py", line 300, in with_hf_config
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return replace(self, model_config=model_config)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.10/lib/python3.11/dataclasses.py", line 1503, in replace
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return obj.__class__(**changes)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.10/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] scheduler_config
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   Value error, max_long_partial_prefills (2147483647) must be greater than or equal to 1 and less than or equal to max_num_partial_prefills (1). [type=value_error, input_value=AscendSchedulerConfig(run..., decode_max_num_seqs=0), input_type=AscendSchedulerConfig]

Currently I fix this bug by changing the default values of these two parameters to align with vLLM. Please take a look: @Csrayz @frankie-ys @xueliangyang-oeuler @wangxiyuan

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR fixes a bug when running multi-modal models with AscendScheduler. The fix involves renaming configuration parameters in AscendSchedulerConfig to avoid conflicts with the base SchedulerConfig from vLLM. This is a good approach. The implementation looks correct, but I found that one of the new parameter names is misleading, which could lead to confusion and incorrect usage. I've provided a suggestion to improve the naming for better clarity and maintainability.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 23, 2025
@wangxiyuan
Copy link
Collaborator

@Csrayz can you take a look at this change?

@whx-sjtu whx-sjtu force-pushed the fix_ascend_scheduler branch from 86263c5 to 0349604 Compare October 24, 2025 02:21
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Oct 24, 2025
@whx-sjtu
Copy link
Collaborator Author

whx-sjtu commented Oct 24, 2025

@Csrayz can you take a look at this change?

Today I found that the meaning of these two parameters is aligned with vLLM. The problem lies in default values. We should just align the default valuse with vLLM. cc @Csrayz @wangxiyuan

@whx-sjtu whx-sjtu force-pushed the fix_ascend_scheduler branch 2 times, most recently from 95cd053 to ffee8b2 Compare October 24, 2025 06:19
@wangxiyuan wangxiyuan added the ready read for review label Oct 24, 2025
@whx-sjtu whx-sjtu added the ready-for-test start test by label for PR label Oct 24, 2025
@whx-sjtu whx-sjtu force-pushed the fix_ascend_scheduler branch from 79ab771 to 2148652 Compare October 24, 2025 16:08
@wangxiyuan wangxiyuan merged commit e33751e into vllm-project:main Oct 25, 2025
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants