Skip to content

Commit 557aa1f

Browse files
committed
Fix vllm break:Support LoRA with speculative decoding:#21068
Signed-off-by: leo-pony <[email protected]>
1 parent e626367 commit 557aa1f

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

.github/workflows/_e2e_test.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ jobs:
185185
#s pytest -sv tests/e2e/multicard/test_external_launcher.py
186186
#s pytest -sv tests/e2e/multicard/test_single_request_aclgraph.py
187187
#s pytest -sv tests/e2e/multicard/test_fused_moe_allgather_ep.py
188-
#s pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
188+
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
189189
190190
# To avoid oom, we need to run the test in a single process.
191191
pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ

vllm_ascend/worker/npu_input_batch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -834,7 +834,7 @@ def _make_prompt_token_ids_tensor(self) -> torch.Tensor:
834834
non_blocking=True)
835835

836836
def make_lora_inputs(
837-
self, num_scheduled_tokens: np.ndarray
837+
self, num_scheduled_tokens: np.ndarray, num_sampled_tokens: np.ndarray
838838
) -> tuple[tuple[int, ...], tuple[int, ...], set[LoRARequest]]:
839839
"""
840840
Given the num_scheduled_tokens for each request in the batch, return

0 commit comments

Comments
 (0)