Skip to content

[Question] Enable GDN Packed Sequence Support for Context Parallelism in Qwen 3.5 series #4043

@plmsmile

Description

@plmsmile

Core

Finetuning Qwen3.5 long context commonly needs Context Parallelism (CP).

The Megatron-LM dev branch has supported CP already. But CP needs PackedSequence and the dev branch has not support it yet.

  • So No GDN PackedSequence, No Qwen3.5 training.

  • So for training Qwen3.5, we need GDN PackedSequence support.

And I'm not sure whether you will have a plan for GDN PackedSequence?

Details

Megatron Repo Code
megatron/core/ssm/gated_delta_net.py

if packed_seq_params is not None:
      # TODO: support packed sequence
      raise NotImplementedError("GDN does not support packed sequence for now.")

A related issue has said:
#3881

Image

Verl mcore code: verl/models/mcore/util.py

cp_size = mpu.get_context_parallel_world_size()
assert cp_size == 1, "Context parallel size without seq_pack is not supported"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions