Skip to content

Conversation

@realliujiaxu
Copy link
Contributor

@realliujiaxu realliujiaxu commented Oct 23, 2025

What this PR does / why we need it?

Remove redundant operations from model_runner and forward_context. This optimization can significantly reduce the idle time (bubble) before decoding when running models with small parameter counts (e.g., Qwen/Qwen2.5-0.5B).

Testing on 800I A2, bubble is reduced from 3.8ms to 2.8ms :
Before
image

After
image

Does this PR introduce any user-facing change?

No

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the codebase by moving some checks into utility functions and removing redundant operations. The changes in model_runner_v1.py correctly fix a bug where stale data was being used and remove redundant code, improving correctness and clarity. However, the new utility function has_layer_idx in utils.py introduces a critical bug due to a flawed caching mechanism. I've provided a comment with a suggested fix for this issue.

Comment on lines +772 to +801
def has_layer_idx(model_instance: torch.nn.Module) -> bool:
global _HAS_LAYER_IDX
if _HAS_LAYER_IDX is None:
_HAS_LAYER_IDX = model_instance is not None and \
hasattr(model_instance, "model") and \
hasattr(model_instance.model, "start_layer")
return _HAS_LAYER_IDX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of has_layer_idx uses a global variable _HAS_LAYER_IDX to cache its result. This caching mechanism is flawed because the function's outcome depends on the model_instance argument, which is not guaranteed to be the same across all calls.

For instance, set_ascend_forward_context can be invoked with model_instance=None (e.g., from kv_connector_no_forward), which would cause _HAS_LAYER_IDX to be cached as False. Any subsequent calls, even with a valid model_instance, would then incorrectly return False, preventing features that rely on layer_idx from being enabled.

Since this check is inexpensive, I recommend removing the caching mechanism to fix this bug. The global variable _HAS_LAYER_IDX should also be removed.

def has_layer_idx(model_instance: torch.nn.Module) -> bool:
    return (model_instance is not None and hasattr(model_instance, "model") and
            hasattr(model_instance.model, "start_layer"))

@realliujiaxu realliujiaxu changed the title [Feat] Delete redundant operations in model_runner and forward_context [Refactor] Delete redundant operations in model_runner and forward_context Oct 24, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@realliujiaxu realliujiaxu changed the title [Refactor] Delete redundant operations in model_runner and forward_context [Perf] Delete redundant operations in model_runner and forward_context Oct 24, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant