Skip to content

Conversation

@fems14
Copy link
Contributor

@fems14 fems14 commented Oct 23, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: fems14 <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a synchronous loading path for the KV cache in the Mooncake engine, controlled by a new load_async configuration flag. The changes are logical and correctly implement the synchronous behavior. However, there are two significant instances of code duplication introduced. The synchronous loading logic is copied from the asynchronous thread handler, and the prepare_value helper method is duplicated from another class. My review includes suggestions to refactor this duplicated code to improve maintainability.

Comment on lines 192 to 211
else:
if self.m_store.config.use_ascend_direct:
addr_list = []
size_list = []
key_list = []
blockIds = []
for start, end, key in self.token_database.process_tokens(
tokens, token_mask):
addr, size, block_id = self.prepare_value(
start, end, request.block_ids)
key_list.append(key.to_string())
addr_list.append(addr)
size_list.append(size)
blockIds.append(block_id)
self.m_store.get_batch(key_list, addr_list, size_list, blockIds)
else:
for start, end, key in self.token_database.process_tokens(
tokens, token_mask):
addr, size, _ = self.prepare_value(start, end, request.block_ids)
self.m_store.get(key, addr, size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This new block for synchronous loading is almost identical to the logic in KVCacheStoreRecvingThread._handle_request. This code duplication can lead to maintenance issues where a bug fix or change in one place might be missed in the other.

To improve maintainability, I recommend extracting this logic into a new private helper method within the MooncakeEngine class, for example, _load_kv_sync. The synchronous path can then call this method.

Ideally, KVCacheStoreRecvingThread should also be refactored to use this new helper method to completely eliminate the duplication. This would likely involve passing the MooncakeEngine instance to the thread's constructor.

Comment on lines +213 to +225
def prepare_value(self, start: int, end: int, block_ids: list[int]):
addr_list = []
size_list = []
block_id = block_ids[start // self.block_size]
for index, base_addr in enumerate(self.kv_caches_base_addr):
block_len = (self.block_len[index % 2]
if self.use_mla else self.block_len[0])

addr = base_addr + block_id * block_len
length = int(block_len / self.block_size * (end - start))
addr_list.append(addr)
size_list.append(length)
return addr_list, size_list, block_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This prepare_value method is a direct copy of the method with the same name in KVTransferThread from kv_transfer.py. Duplicating code like this makes the codebase harder to maintain.

It would be better to refactor this into a common utility function that can be called from both MooncakeEngine and KVTransferThread. This function would take necessary context (like block_size, kv_caches_base_addr, block_len, and use_mla) as arguments, ensuring the logic is defined in a single place.

@wangxiyuan
Copy link
Collaborator

Signed-off-by: fems14 <[email protected]>
@MengqingCao MengqingCao added ready read for review ready-for-test start test by label for PR labels Oct 24, 2025
Signed-off-by: fems14 <[email protected]>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 24, 2025
@fems14 fems14 closed this Oct 24, 2025
@fems14 fems14 deleted the sync-load branch October 25, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants