Skip to content

cp: [training] fix: Route batches to standalone MTP stages (4208) into r0.5.0#4225

Open
svcnvidia-nemo-ci wants to merge 1 commit into
r0.5.0from
cherry-pick-4208-r0.5.0
Open

cp: [training] fix: Route batches to standalone MTP stages (4208) into r0.5.0#4225
svcnvidia-nemo-ci wants to merge 1 commit into
r0.5.0from
cherry-pick-4208-r0.5.0

Conversation

@svcnvidia-nemo-ci

Copy link
Copy Markdown
Contributor

beep boop [🤖]: Hi @yaoyu-33 👋,

we've cherry picked #4208 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@svcnvidia-nemo-ci

Copy link
Copy Markdown
Contributor Author

/ok to test d447cfa

@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 added area:training Training loop, callbacks, and runtime integration bug Something isn't working needs-review PR is ready for code review and waiting on a reviewer labels Jun 9, 2026
@ko3n1g

ko3n1g commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

/claude review

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Code Review — PR #4225

Summary: Cherry-pick of #4208. Adds stage-aware MTP batch routing so only PP stages that own MTP layers receive token/position_id tensors, instead of broadcasting them to all stages when use_mtp=True. Three new helper functions and four new unit tests.

No critical bugs or logic errors found. LGTM.

Minor observations (non-blocking)

  • Untested code paths in _layout_stage_has_mtp: The str and PipelineParallelLayerLayout branches are not exercised by unit tests — only the list branch is covered. Worth noting for future functional test coverage.
  • No VPP test: _layout_stage_has_mtp accepts a vp_stage parameter, but all tests use vp_stage=0. A test with vp_stage > 0 would strengthen coverage of the VPP+MTP interaction.

Suggested test cases

No perf tests impacted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:training Training loop, callbacks, and runtime integration bug Something isn't working cherry-pick needs-review PR is ready for code review and waiting on a reviewer Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants