[bugfix]Prevent overwriting drafters lm-head and embed_tokens #4134

HF-001 · 2025-11-12T02:40:54Z

What this PR does / why we need it?

Some EAGLE3 drafters might have their own lm_head and/or embed_tokens layers. Existing codebase ignores this.By solving this problem, it can greatly improve acceptance rates. Refer to this pr in vllm: vllm-project/vllm#27737

Does this PR introduce any user-facing change?

How was this patch tested?

export CUDA_VISIBLE_DEVICES=0
export TP=1
export MODEL_PATH=/model/Llama-3.1-8B-Instruct
export MODEL_NAME=Llama-3.1-8B-Instruct
export PORT=10133
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port ${PORT} --dtype bfloat16 --model ${MODEL_PATH} --served-model-name ${MODEL_NAME} --tensor-parallel-size ${TP} --gpu-memory-utilization 0.85 --max-model-len 32768 --trust-remote-code --seed 42 --speculative_config '{"method":"eagle3","model":"/model/EAGLE3-LLaMA3.1-Instruct-8B","num_speculative_tokens":5,"draft_tensor_parallel_size":1}'

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

Signed-off-by: 01267596 <[email protected]>

github-actions · 2025-11-12T02:41:02Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to prevent overwriting the lm_head and embed_tokens of a drafter model if they are already initialized, which is particularly important for some EAGLE3 drafters. The changes correctly add checks for has_own_embed_tokens and has_own_lm_head flags before sharing these layers from the target model. However, I've identified a critical regression in the logic for handling lm_head for SpecDcodeType.EAGLE. The check that ensures the target model has an lm_head attribute before it's accessed has been removed, which will likely cause an AttributeError at runtime. My review includes a specific code suggestion to fix this issue.

gemini-code-assist · 2025-11-12T02:41:57Z

vllm_ascend/spec_decode/eagle_proposer.py

+            if hasattr(model, "lm_head"):
+                logger.info("Loading EAGLE LM head weights from the target model.")
            if supports_multimodal(model):
                self.model.lm_head = model.get_language_model().lm_head
            else:
                self.model.lm_head = model.lm_head


The logic for sharing lm_head for SpecDcodeType.EAGLE has been changed in a way that introduces a potential AttributeError. The assignments to self.model.lm_head are no longer guarded by hasattr(model, "lm_head"). If the target model does not have an lm_head attribute, this will cause a crash. The original logic was safer and should be restored.

Suggested change

if hasattr(model, "lm_head"):

logger.info("Loading EAGLE LM head weights from the target model.")

if supports_multimodal(model):

self.model.lm_head = model.get_language_model().lm_head

else:

self.model.lm_head = model.lm_head

if hasattr(model, "lm_head"):

logger.info("Loading EAGLE LM head weights from the target model.")

if supports_multimodal(model):

self.model.lm_head = model.get_language_model().lm_head

else:

self.model.lm_head = model.lm_head

Signed-off-by: 01267596 <[email protected]>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

9b396bc

Signed-off-by: 01267596 <[email protected]>

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

01267596 added 5 commits November 12, 2025 02:53

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

cccf7ef

Signed-off-by: 01267596 <[email protected]>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

f258b30

Signed-off-by: 01267596 <[email protected]>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

a28bdf1

Signed-off-by: 01267596 <[email protected]>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

471ed46

Signed-off-by: 01267596 <[email protected]>

[bugfix]Prevent overwriting drafters lm-head and embed_tokens

05ab251

Signed-off-by: 01267596 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix]Prevent overwriting drafters lm-head and embed_tokens #4134

[bugfix]Prevent overwriting drafters lm-head and embed_tokens #4134

HF-001 commented Nov 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[bugfix]Prevent overwriting drafters lm-head and embed_tokens #4134

Are you sure you want to change the base?

[bugfix]Prevent overwriting drafters lm-head and embed_tokens #4134

Conversation

HF-001 commented Nov 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HF-001 commented Nov 12, 2025 •

edited by github-actions bot

Loading