[Question] Enable GDN Packed Sequence Support for Context Parallelism in Qwen 3.5 series

### Core
Finetuning **Qwen3.5  long context** commonly needs **Context Parallelism (CP).**

The **Megatron-LM dev branch** has supported CP already. But **CP needs PackedSequence** and the dev branch has not support it yet. 

- So **No GDN PackedSequence**, No **Qwen3.5 training**.

- So for **training Qwen3.5**, we need **GDN PackedSequence support**.

And I'm not sure whether **you will have a plan for GDN PackedSequence**?

### Details
Megatron Repo Code
[megatron/core/ssm/gated_delta_net.py](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/ssm/gated_delta_net.py#L524)
``` python
if packed_seq_params is not None:
      # TODO: support packed sequence
      raise NotImplementedError("GDN does not support packed sequence for now.")
```

A related issue has said:
https://github.com/NVIDIA/Megatron-LM/issues/3881

<img width="1330" height="242" alt="Image" src="https://github.com/user-attachments/assets/af5de7b9-8a01-4e02-9b23-0090ac94dc04" />

Verl mcore code: [verl/models/mcore/util.py](https://github.com/verl-project/verl/blob/main/verl/models/mcore/util.py)
``` python
cp_size = mpu.get_context_parallel_world_size()
assert cp_size == 1, "Context parallel size without seq_pack is not supported"
```






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Enable GDN Packed Sequence Support for Context Parallelism in Qwen 3.5 series #4043

Core

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Enable GDN Packed Sequence Support for Context Parallelism in Qwen 3.5 series #4043

Description

Core

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions