lmdeploy suppport parrllel embedding by Tsundoku958 · Pull Request #4192 · InternLM/lmdeploy

Tsundoku958 · 2025-12-09T09:50:26Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

I noticed that the current lmdeploy does not use tensor parallelism for the embedding layer and lm_head, yet they consume nearly as much GPU memory as the linear layers.Maybe This PR adds support for tensor parallelism in the embedding layer.

Modification

The rowwise tensor parallelism for the embedding layer
Corresponding unit test files.

Perhaps TP (tensor parallelism) for embedding and lm_head could be enabled by default in lmdeploy, or a new args could be added to let users control whether to enable or disable embedding parallelism？
@grimoire @lvhan028

windreamer · 2026-01-04T07:00:37Z

Please fix the lint issue. You can setup the precommit hooks to run the linter automotically before commit.

lmdeploy/pytorch/nn/__init__.py

Tsundoku958 · 2026-01-04T07:57:03Z

Please fix the lint issue. You can setup the precommit hooks to run the linter automotically before commit.

It's done. And now deepseek_v2 will use ParallelEmbedding instead of nn.embedding.

lmdeploy/pytorch/nn/embedding.py

Copilot

Pull request overview

This PR introduces tensor parallelism (TP) support for embedding layers in lmdeploy to reduce GPU memory consumption. The implementation uses rowwise tensor parallelism to shard the vocabulary dimension across multiple GPUs.

Key changes:

Added ParallelEmbedding module with configurable tensor parallelism support via the is_tp flag
Implemented backend abstraction for embedding operations with default CUDA implementation
Updated deepseek_v2 model to use the new parallel embedding layer

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
lmdeploy/pytorch/nn/embedding.py	Core implementation of ParallelEmbedding with TP support and weight loading logic
lmdeploy/pytorch/backends/embedding.py	Abstract base classes for embedding implementation and builder
lmdeploy/pytorch/backends/default/embedding.py	Default embedding implementation with masking and all-reduce for TP
lmdeploy/pytorch/backends/default/op_backend.py	Registers embedding builder in the default backend
lmdeploy/pytorch/backends/base.py	Adds Embedding to OpType enum
lmdeploy/pytorch/nn/init.py	Exports ParallelEmbedding for public API
lmdeploy/pytorch/models/deepseek_v2.py	Replaces nn.Embedding with ParallelEmbedding in deepseek_v2 model
tests/pytorch/kernel/test_embedding.py	Unit tests for parallel embedding with multi-GPU setup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/pytorch/nn/test_embedding.py

Copilot · 2026-01-04T10:30:02Z

tests/pytorch/nn/test_embedding.py

+        result_queue = mp.Queue()
+
+        for rank in range(world_size):
+            p = mp.Process(target=parrall_emb,


The function call contains a typo: 'parrall_emb' should be 'parallel_emb'.

lmdeploy/pytorch/nn/embedding.py

Copilot · 2026-01-04T10:30:03Z

tests/pytorch/nn/test_embedding.py

+                                 device=torch.device(type='cuda', index=0))
+        token_emb.weight.data.copy_(weight)
+        token_emb._fill_padding_idx_with_zero()
+        input = x.to(torch.device(type='cuda', index=0))


The variable 'input' shadows the built-in Python function 'input'. Consider renaming it to 'inputs', 'input_tensor', or 'x_cuda' to avoid shadowing built-ins.

Copilot · 2026-01-04T10:30:03Z

lmdeploy/pytorch/backends/default/embedding.py

+
+
+class DefaultEmbeddingImpl(EmbeddingImpl):
+    """Embedding implementation api."""


The comment says "Embedding implementation api" but it should be "Embedding implementation API" (API should be uppercase as it's an acronym).

Suggested change

"""Embedding implementation api."""

"""Embedding implementation API."""

tests/pytorch/nn/test_embedding.py

Copilot · 2026-01-04T10:30:06Z

tests/pytorch/nn/test_embedding.py

+    @pytest.mark.parametrize('seqlen', [1024, 1011, 128], indirect=True)
+    @pytest.mark.parametrize('tp', [2], indirect=True)
+    @pytest.mark.parametrize('dtype', [torch.bfloat16], indirect=True)
+    def test_embedding(self, vocab_size, feat_size, padding_idx, seqlen, tp, dtype, x, weight, gt):


The test only covers the tensor parallel case (is_tp=True). To ensure the ParallelEmbedding module works correctly in non-TP mode, add test cases with is_tp=False to verify that the module behaves like a standard embedding layer when tensor parallelism is disabled.

Copilot · 2026-01-04T10:30:06Z

lmdeploy/pytorch/nn/embedding.py

+
+        dist_cfg = get_dist_manager().current_config()
+        _, self.rank = get_tp_world_rank(layer_type)
+        self.tp, tp_mode = dist_cfg.get_tp_by_layer(layer_type)


Variable tp_mode is not used.

Suggested change

self.tp, tp_mode = dist_cfg.get_tp_by_layer(layer_type)

self.tp, _ = dist_cfg.get_tp_by_layer(layer_type)

Copilot · 2026-01-04T10:30:06Z

lmdeploy/pytorch/nn/embedding.py

@@ -0,0 +1,98 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch


Module 'torch' is imported with both 'import' and 'import from'.
Module 'lmdeploy.pytorch.check_env.torch' is imported with both 'import' and 'import from'.

Copilot · 2026-01-04T10:30:06Z

tests/pytorch/nn/test_embedding.py

+import torch
+import torch.distributed as dist
+import torch.multiprocessing as mp
+from torch import nn


Module 'torch' is imported with both 'import' and 'import from'.
Module 'lmdeploy.pytorch.check_env.torch' is imported with both 'import' and 'import from'.

Suggested change

from torch import nn

lvhan028 · 2026-01-04T10:39:24Z

We need to check if it affects the inference performance especially for larger tp value

lmdeploy/pytorch/nn/embedding.py

grimoire · 2026-01-05T06:48:51Z

tests/pytorch/nn/test_embedding.py

@@ -0,0 +1,125 @@
+import os


This is not a kernel, the ut should not be placed here.

I’m not quite sure where to place the unit test files—could you give me a suggestion?

Create a new folder under pytorch, may be pytorch/nn. Or just forget about the unit test, we have daily ete test.

grimoire · 2026-01-06T04:21:35Z

lmdeploy/pytorch/nn/embedding.py

+                 dtype: torch.dtype = None,
+                 device: torch.device = None,
+                 is_tp: bool = False,
+                 padding_size: int = DEFAULT_VOCAB_PADDING_SIZE,


Different layer_type have different behaviour when dp>1. As you want to gather inputs in tp groups, I think the default value should be 'attn'.

grimoire · 2026-01-06T04:41:02Z

lmdeploy/pytorch/backends/default/embedding.py

+            out = F.embedding(x, weight)
+
+        if all_reduce:
+            dist.all_reduce(out, group=group)


reduce can be place in the branch above.

grimoire · 2026-01-06T04:53:17Z

lmdeploy/pytorch/backends/default/embedding.py

+
+
+def get_masked_input_and_mask(input: torch.Tensor, start_index: int, end_index: int):
+    vocab_mask = (input >= start_index) & (input < end_index)


Can be done in less op

masked_input = (input - start_index).clamp(0, end_index - start_index) inv_vocab_mask = masked_input != input

The code may be not right. masked_input = (input - start_index).clamp(0, end_index - start_index)will modify the input values, causing even the unmasked values in masked_input to differ from their original values.

what about

input = input - start_index masked_input = input.clamp(0, end_index - start_index) inv_vocab_mask = masked_input != input

It's done. Because it is a right-open interval, the code is as masked_input = input.clamp(0, end_index - start_index - 1)

Tsundoku958 · 2026-01-08T11:09:27Z

Any question about this PR? @grimoire

windreamer · 2026-01-08T11:15:28Z

Any question about this PR? @grimoire

You can rebase to the master to resolve the failed tests.

grimoire

LGTM

lvhan028 · 2026-01-09T03:10:29Z

Hi, @Tsundoku958 may you fix the typo speicified by copilot?

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tests/pytorch/nn/test_embedding.py

…ding

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Tsundoku958 · 2026-01-09T08:41:47Z

Hi, @Tsundoku958 may you fix the typo speicified by copilot?

The question speicified by copilot is fixed.

windreamer mentioned this pull request Jan 4, 2026

[Feature] pytorch backend 的embedding 层支持tp并行 #4243

Closed

maoruihan added 5 commits January 4, 2026 14:28

support row parallel embedding

6291fb8

add test_embedding

c9e447b

fix test embedding

bc5ba6e

fix padding_idx

73916c0

fix

2bcced7

Tsundoku958 force-pushed the Tsundoku958/suppport-parrllel-embedding branch from 26666f5 to 2bcced7 Compare January 4, 2026 06:30

Tsundoku958 marked this pull request as ready for review January 4, 2026 06:44

maoruihan added 2 commits January 4, 2026 15:37

fix

b08b10d

fix

74f816f

windreamer reviewed Jan 4, 2026

View reviewed changes

lmdeploy/pytorch/nn/__init__.py Outdated Show resolved Hide resolved

windreamer linked an issue Jan 4, 2026 that may be closed by this pull request

[Feature] pytorch backend 的embedding 层支持tp并行 #4243

Closed

windreamer reviewed Jan 4, 2026

View reviewed changes

lmdeploy/pytorch/nn/embedding.py Outdated Show resolved Hide resolved

windreamer approved these changes Jan 4, 2026

View reviewed changes

assert self.vocab_size_padded % self.tp

4effa7b

lvhan028 requested review from Copilot and grimoire January 4, 2026 10:26

lvhan028 added the improvement label Jan 4, 2026

Copilot started reviewing on behalf of lvhan028 January 4, 2026 10:26 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

grimoire reviewed Jan 5, 2026

View reviewed changes

lmdeploy/pytorch/nn/embedding.py Show resolved Hide resolved

grimoire reviewed Jan 5, 2026

View reviewed changes

grimoire reviewed Jan 6, 2026

View reviewed changes

maoruihan added 2 commits January 6, 2026 14:52

fix test_embedding and defaul tp group

4700a67

fix

fd7a1a8

windreamer requested a review from grimoire January 8, 2026 11:16

grimoire approved these changes Jan 9, 2026

View reviewed changes

Apply suggestions from code review

ef445b6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

windreamer requested changes Jan 9, 2026

View reviewed changes

tests/pytorch/nn/test_embedding.py Outdated Show resolved Hide resolved

Update tests/pytorch/nn/test_embedding.py

da8ffb3

windreamer approved these changes Jan 9, 2026

View reviewed changes

Tsundoku958 and others added 3 commits January 9, 2026 14:09

Merge branch 'InternLM:main' into Tsundoku958/suppport-parrllel-embed…

1bdb0ff

…ding

Update lmdeploy/pytorch/nn/embedding.py

a3d7828

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix embedding

5b7e9d6

windreamer merged commit 4c0ca84 into InternLM:main Jan 9, 2026
5 checks passed



		class DefaultEmbeddingImpl(EmbeddingImpl):
		"""Embedding implementation api."""

	self.tp, tp_mode = dist_cfg.get_tp_by_layer(layer_type)
	self.tp, _ = dist_cfg.get_tp_by_layer(layer_type)

		@@ -0,0 +1,98 @@
		# Copyright (c) OpenMMLab. All rights reserved.
		import torch



		def get_masked_input_and_mask(input: torch.Tensor, start_index: int, end_index: int):
		vocab_mask = (input >= start_index) & (input < end_index)

Conversation

Tsundoku958 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Uh oh!

windreamer commented Jan 4, 2026

Uh oh!

Uh oh!

Tsundoku958 commented Jan 4, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Jan 4, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tsundoku958 commented Jan 8, 2026

Uh oh!

windreamer commented Jan 8, 2026

Uh oh!

grimoire left a comment

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Jan 9, 2026

Uh oh!

Uh oh!

Tsundoku958 commented Jan 9, 2026

Uh oh!

Tsundoku958 commented Dec 9, 2025 •

edited

Loading