Builtin mrope by grimoire · Pull Request #4393 · InternLM/lmdeploy

grimoire · 2026-03-04T02:43:30Z

Mrope does not require update_model_metas.
remove make_buffer and fill_buffer callback for mrope and ssm.
fix qwen2/2.5 failed on transformers>=5 (which is broken by lm_head update)

Copilot

Pull request overview

This PR centralizes MROPE (multi-dimensional rotary position ids) handling into the scheduler/engine pipeline, adds a meta mechanism for generating correct dummy inputs (SSM/MROPE aware), and refactors cudagraph buffer handling to be model-agnostic.

Changes:

Add ModelConfig.use_mrope and propagate it through SequenceMeta/scheduler history to generate and carry MROPE position ids end-to-end.
Extend make_dummy_inputs / ModelInputsStrategy.make_dummy with MakeDummyMeta so warmup/cudagraph capture includes optional SSM + MROPE inputs.
Remove per-model cudagraph/mrope meta update overrides in several Qwen/GLM model implementations in favor of shared infrastructure.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
lmdeploy/pytorch/strategies/dllm/model_inputs.py	Thread `MakeDummyMeta` through DLLM dummy input creation.
lmdeploy/pytorch/strategies/base/model_inputs.py	Introduce `MakeDummyMeta`, add optional dummy fields for SSM/MROPE, and a factory method from `ModelConfig`.
lmdeploy/pytorch/strategies/ar_spec/sequence.py	Update sequence token update flow to also update MROPE history.
lmdeploy/pytorch/strategies/ar_spec/model_inputs.py	Thread `MakeDummyMeta` through AR-spec dummy input creation.
lmdeploy/pytorch/strategies/ar/sequence.py	Update sequence token update flow to also update MROPE history.
lmdeploy/pytorch/strategies/ar/model_inputs.py	Add MROPE propagation in merge/index_select paths and thread `MakeDummyMeta` into dummy creation.
lmdeploy/pytorch/strategies/ar/model_agent.py	Add MROPE propagation when building next-step decoding `ModelInputs`.
lmdeploy/pytorch/spec_decode/spec_agent.py	Cache dummy-meta and pass it into warmup dummy inputs.
lmdeploy/pytorch/paging/scheduler.py	Modernize typing for optional `seq_meta`.
lmdeploy/pytorch/multimodal/image_type.py	Remove unused `ImageData` type.
lmdeploy/pytorch/multimodal/data_type.py	Add `mrope_pos_ids` field to multimodal tensors and modernize typing.
lmdeploy/pytorch/multimodal/init.py	Update exports after multimodal type cleanup.
lmdeploy/pytorch/models/utils/cudagraph.py	Add generic cudagraph buffers/handling for MROPE + SSM and plumb through context updates.
lmdeploy/pytorch/models/qwen3_vl.py	Remove per-model cudagraph/mrope meta overrides (rely on shared pipeline).
lmdeploy/pytorch/models/qwen3_next.py	Remove per-model cudagraph SSM buffer overrides (rely on shared pipeline).
lmdeploy/pytorch/models/qwen3_5.py	Remove per-model cudagraph/mrope+SSM overrides (rely on shared pipeline).
lmdeploy/pytorch/models/qwen2_vl.py	Remove per-model cudagraph/mrope meta update logic and add MROPE pos-id generation in input processor.
lmdeploy/pytorch/models/qwen2_5_vl.py	Reuse Qwen2-VL input processor and remove duplicated processor / per-model cudagraph logic.
lmdeploy/pytorch/models/glm4_1v.py	Reuse Qwen2-VL input processor and remove duplicated processor / per-model cudagraph logic.
lmdeploy/pytorch/model_inputs.py	Add MROPE tensors to `ModelInputs`/`StepContext` and plumb into context creation.
lmdeploy/pytorch/messages.py	Add per-sequence MROPE history storage and automatic updates on token append.
lmdeploy/pytorch/engine/model_agent/inputs_maker.py	Pass dummy-meta into dummy forward inputs used by the model agent input maker.
lmdeploy/pytorch/engine/model_agent/agent.py	Cache dummy-meta at agent construction and pass it into warmup dummy inputs.
lmdeploy/pytorch/engine/inputs_maker.py	Add `use_mrope` to engine inputs config and attach MROPE ids to `ModelInputs`.
lmdeploy/pytorch/engine/engine.py	Propagate `ModelConfig.use_mrope` into `SequenceMeta` construction.
lmdeploy/pytorch/configurations/qwen3_vl.py	Enable `use_mrope` for VL model configs.
lmdeploy/pytorch/configurations/qwen3_5.py	Enable `use_mrope` for Qwen3.5 configs.
lmdeploy/pytorch/configurations/glm4.py	Enable `use_mrope` for GLM4 moe-lite configs.
lmdeploy/pytorch/config.py	Add `ModelConfig.use_mrope` flag.
lmdeploy/pytorch/backends/cuda/graph_runner.py	Plumb `use_mrope`/`is_ssm` into cudagraph meta for buffer creation/filling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/pytorch/spec_decode/spec_agent.py

lmdeploy/pytorch/engine/model_agent/agent.py

lmdeploy/pytorch/model_inputs.py

Copilot · 2026-03-04T02:53:44Z

lmdeploy/pytorch/strategies/ar/model_inputs.py

+        if mrope_pos_ids is not None:
+            mrope_pos_ids = mrope_pos_ids[:, indices]
+


ARModelInputsStrategy.index_select() applies indices (which are batch indices from ModelInputsDelta.indice_cpu) directly to mrope_pos_ids via mrope_pos_ids[:, indices]. But mrope_pos_ids is stored in packed-token layout (shape (3, sum_q_seqlens); see InputsMaker.create_model_inputs() concatenation + transpose), so selecting by batch indices will pick the wrong token columns whenever max_q_seqlen > 1 (e.g., speculative decoding / block decoding). This needs to select the token segments for the kept sequences (or reshape/unpack before indexing) so that mrope positions stay aligned with input_ids/seq_length.

Suggested change

if mrope_pos_ids is not None:

mrope_pos_ids = mrope_pos_ids[:, indices]

if mrope_pos_ids is not None and indices is not None:

# mrope_pos_ids is stored in packed-token layout of shape

# (3, sum_q_seqlens), where tokens of all sequences are

# concatenated along dim=1. The `indices` tensor contains

# sequence (batch) indices, so we need to select the

# corresponding token segments and re-pack them, instead of

# indexing dim=1 directly by batch indices.

orig_seq_length = inputs.seq_length

if orig_seq_length is not None:

# Compute start offsets for each sequence in the packed layout.

# For lengths L[0..B-1], start_offsets[i] = sum_{j < i} L[j].

seq_lens = orig_seq_length.to(dtype=torch.long)

start_offsets = torch.zeros_like(seq_lens)

if seq_lens.numel() > 1:

start_offsets[1:] = torch.cumsum(seq_lens[:-1], dim=0)

# Gather segments for the kept sequences and concatenate.

selected_starts = start_offsets[indices].tolist()

selected_lens = seq_lens[indices].tolist()

segments = []

for start, length in zip(selected_starts, selected_lens):

end = start + length

segments.append(mrope_pos_ids[:, start:end])

if segments:

mrope_pos_ids = torch.cat(segments, dim=1)

else:

# No sequences kept; preserve correct shape on dim=0.

mrope_pos_ids = mrope_pos_ids[:, :0]

grimoire added 6 commits March 2, 2026 17:55

built-in mrope pos ids support

34d4fa6

Merge branch 'main' into builtin-mrope

aca5801

fix qwen2

d564010

remove make and fill buffer

e35c905

fix context

dc5f052

Merge branch 'main' into builtin-mrope

274f362

Copilot AI review requested due to automatic review settings March 4, 2026 02:43

Copilot started reviewing on behalf of grimoire March 4, 2026 02:43 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

grimoire added 3 commits March 4, 2026 11:03

update typehint

3a46034

fix comment of copilot

2146481

fix comment of copilot2

db5daf0

lvhan028 requested a review from CUHKSZzxy March 4, 2026 06:36

lvhan028 added the improvement label Mar 4, 2026

grimoire added 3 commits March 4, 2026 18:44

remove index select

1d5224b

rename mmdata

51b4b17

merge main

69b2328

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Builtin mrope#4393

Builtin mrope#4393
grimoire wants to merge 12 commits intoInternLM:mainfrom
grimoire:builtin-mrope

grimoire commented Mar 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if mrope_pos_ids is not None:
		mrope_pos_ids = mrope_pos_ids[:, indices]

-        if mrope_pos_ids is not None:
-            mrope_pos_ids = mrope_pos_ids[:, indices]
+        if mrope_pos_ids is not None and indices is not None:
+            # mrope_pos_ids is stored in packed-token layout of shape
+            # (3, sum_q_seqlens), where tokens of all sequences are
+            # concatenated along dim=1. The `indices` tensor contains
+            # sequence (batch) indices, so we need to select the
+            # corresponding token segments and re-pack them, instead of
+            # indexing dim=1 directly by batch indices.
+            orig_seq_length = inputs.seq_length
+            if orig_seq_length is not None:
+                # Compute start offsets for each sequence in the packed layout.
+                # For lengths L[0..B-1], start_offsets[i] = sum_{j < i} L[j].
+                seq_lens = orig_seq_length.to(dtype=torch.long)
+                start_offsets = torch.zeros_like(seq_lens)
+                if seq_lens.numel() > 1:
+                    start_offsets[1:] = torch.cumsum(seq_lens[:-1], dim=0)
+                # Gather segments for the kept sequences and concatenate.
+                selected_starts = start_offsets[indices].tolist()
+                selected_lens = seq_lens[indices].tolist()
+                segments = []
+                for start, length in zip(selected_starts, selected_lens):
+                    end = start + length
+                    segments.append(mrope_pos_ids[:, start:end])
+                if segments:
+                    mrope_pos_ids = torch.cat(segments, dim=1)
+                else:
+                    # No sequences kept; preserve correct shape on dim=0.
+                    mrope_pos_ids = mrope_pos_ids[:, :0]

Conversation

grimoire commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grimoire commented Mar 4, 2026 •

edited

Loading