【main】kvpool sync load #3653

fems14 · 2025-10-23T03:54:21Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: fems14 <[email protected]>

github-actions · 2025-10-23T03:54:34Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a synchronous loading path for the KV cache in the Mooncake engine, controlled by a new load_async configuration flag. The changes are logical and correctly implement the synchronous behavior. However, there are two significant instances of code duplication introduced. The synchronous loading logic is copied from the asynchronous thread handler, and the prepare_value helper method is duplicated from another class. My review includes suggestions to refactor this duplicated code to improve maintainability.

gemini-code-assist · 2025-10-23T03:56:09Z

vllm_ascend/distributed/mooncake/mooncake_engine.py

+                else:
+                    if self.m_store.config.use_ascend_direct:
+                        addr_list = []
+                        size_list = []
+                        key_list = []
+                        blockIds = []
+                        for start, end, key in self.token_database.process_tokens(
+                                tokens, token_mask):
+                            addr, size, block_id = self.prepare_value(
+                                start, end, request.block_ids)
+                            key_list.append(key.to_string())
+                            addr_list.append(addr)
+                            size_list.append(size)
+                            blockIds.append(block_id)
+                        self.m_store.get_batch(key_list, addr_list, size_list, blockIds)
+                    else:
+                        for start, end, key in self.token_database.process_tokens(
+                                tokens, token_mask):
+                            addr, size, _ = self.prepare_value(start, end, request.block_ids)
+                            self.m_store.get(key, addr, size)


This new block for synchronous loading is almost identical to the logic in KVCacheStoreRecvingThread._handle_request. This code duplication can lead to maintenance issues where a bug fix or change in one place might be missed in the other.

To improve maintainability, I recommend extracting this logic into a new private helper method within the MooncakeEngine class, for example, _load_kv_sync. The synchronous path can then call this method.

Ideally, KVCacheStoreRecvingThread should also be refactored to use this new helper method to completely eliminate the duplication. This would likely involve passing the MooncakeEngine instance to the thread's constructor.

gemini-code-assist · 2025-10-23T03:56:09Z

vllm_ascend/distributed/mooncake/mooncake_engine.py

+    def prepare_value(self, start: int, end: int, block_ids: list[int]):
+        addr_list = []
+        size_list = []
+        block_id = block_ids[start // self.block_size]
+        for index, base_addr in enumerate(self.kv_caches_base_addr):
+            block_len = (self.block_len[index % 2]
+                         if self.use_mla else self.block_len[0])
+
+            addr = base_addr + block_id * block_len
+            length = int(block_len / self.block_size * (end - start))
+            addr_list.append(addr)
+            size_list.append(length)
+        return addr_list, size_list, block_id


This prepare_value method is a direct copy of the method with the same name in KVTransferThread from kv_transfer.py. Duplicating code like this makes the codebase harder to maintain.

It would be better to refactor this into a common utility function that can be called from both MooncakeEngine and KVTransferThread. This function would take necessary context (like block_size, kv_caches_base_addr, block_len, and use_mla) as arguments, ensuring the logic is defined in a single place.

wangxiyuan · 2025-10-23T08:33:20Z

please update the README as well https://github.com/vllm-project/vllm-ascend/tree/main/examples/disaggregated_prefill_v1

Signed-off-by: fems14 <[email protected]>

kvpool sync load

de744f6

Signed-off-by: fems14 <[email protected]>

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

fix lint

507a3eb

Signed-off-by: fems14 <[email protected]>

MengqingCao added ready read for review ready-for-test start test by label for PR labels Oct 24, 2025

add readme

7fc7ed8

Signed-off-by: fems14 <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Oct 24, 2025

fems14 closed this Oct 24, 2025

fems14 deleted the sync-load branch October 25, 2025 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【main】kvpool sync load #3653

【main】kvpool sync load #3653

Uh oh!

fems14 commented Oct 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

wangxiyuan commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

【main】kvpool sync load #3653

【main】kvpool sync load #3653

Uh oh!

Conversation

fems14 commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fems14 commented Oct 23, 2025 •

edited by github-actions bot

Loading