PD heterogenous TP #77

NickLucche · 2025-05-06T09:09:09Z

What this PR does:

Adds support for heterogenous TP sizes. So far tested with P_TP=1 and D_TP=2/4.
It does so by splitting the remote kv cache among D_TP/P_TP D workers along the kv_head dim. block_size and kv precision must match.
(JIT) Discovery is done by first querying the rank0 of the dst_engine_id to get the TP size of destination group. After that, every D TP worker will only pull from a single remote P TP worker, hence the setup only requires two exchanges on the side channel (rank0 and target rank_j).

TODOs:

P_TP=2 and D_TP=4
benchmark pre-post PR to verify whether the higher number of prepared nixl descriptors has an impact.
MLA works with @tlrmchlsmth patch (not in this PR).

Signed-off-by: nicklucche <[email protected]>

github-actions · 2025-05-06T09:09:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: nicklucche <[email protected]>

tlrmchlsmth · 2025-05-09T16:19:17Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

@@ -340,6 +369,7 @@ def register_kv_caches(self, kv_caches: dict[str, torch.Tensor]):
            # MLA case.
            self.num_blocks = first_kv_cache.shape[0]
            block_rank = 2  # [block_size, latent_dim]
+            # TODO does this include tp dependent size?


For MLA we replicate the KV cache across TP ranks, so in this case the prefiller would need to send the same blocks to all decoders. This is the same when TP size is greater than the num kv heads

Signed-off-by: nicklucche <[email protected]>

NickLucche added 3 commits May 6, 2025 08:08

one remote agent per remote rank

c35cf98

Signed-off-by: nicklucche <[email protected]>

tp_size in metadata and handshake with rank0 first

ec84817

Signed-off-by: nicklucche <[email protected]>

todos

60ab197

Signed-off-by: nicklucche <[email protected]>

NickLucche added 9 commits May 6, 2025 12:50

dst_num_blocks is engine_id only

792bacd

Signed-off-by: nicklucche <[email protected]>

fixes

d2ea8c8

Signed-off-by: nicklucche <[email protected]>

block_len is tp dependent

f17092f

Signed-off-by: nicklucche <[email protected]>

wip

ddf4c8e

Signed-off-by: nicklucche <[email protected]>

refactor remote kv_cache splitting and ditch tp_multiplier

00392ce

Signed-off-by: nicklucche <[email protected]>

2-handshake model with vertical kv cache split

eb0bdd2

Signed-off-by: nicklucche <[email protected]>

still broken

44db464

Signed-off-by: nicklucche <[email protected]>

minor

52d2325

Signed-off-by: nicklucche <[email protected]>

revert config changes

8080346

Signed-off-by: nicklucche <[email protected]>

tlrmchlsmth reviewed May 9, 2025

View reviewed changes

NickLucche added 4 commits May 10, 2025 09:54

split kv_cache along head dim

f216e03

Signed-off-by: nicklucche <[email protected]>

fix descr indexing

72a4c14

Signed-off-by: nicklucche <[email protected]>

clean up

522f647

Signed-off-by: nicklucche <[email protected]>

format

e4e4749

Signed-off-by: nicklucche <[email protected]>

NickLucche marked this pull request as ready for review May 10, 2025 18:29

NickLucche added 2 commits May 10, 2025 18:31

format

d2ce96a

Signed-off-by: nicklucche <[email protected]>

type

ca0e15f

Signed-off-by: nicklucche <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PD heterogenous TP #77

PD heterogenous TP #77

NickLucche commented May 6, 2025 •

edited by github-actions bot

Loading

github-actions bot commented May 6, 2025

tlrmchlsmth May 9, 2025 •

edited

Loading

PD heterogenous TP #77

Are you sure you want to change the base?

PD heterogenous TP #77

Conversation

NickLucche commented May 6, 2025 • edited by github-actions bot Loading

What this PR does:

TODOs:

github-actions bot commented May 6, 2025

tlrmchlsmth May 9, 2025 • edited Loading

Choose a reason for hiding this comment

NickLucche commented May 6, 2025 •

edited by github-actions bot

Loading

tlrmchlsmth May 9, 2025 •

edited

Loading