Core
Finetuning Qwen3.5 long context commonly needs Context Parallelism (CP).
The Megatron-LM dev branch has supported CP already. But CP needs PackedSequence and the dev branch has not support it yet.
-
So No GDN PackedSequence, No Qwen3.5 training.
-
So for training Qwen3.5, we need GDN PackedSequence support.
And I'm not sure whether you will have a plan for GDN PackedSequence?
Details
Megatron Repo Code
megatron/core/ssm/gated_delta_net.py
if packed_seq_params is not None:
# TODO: support packed sequence
raise NotImplementedError("GDN does not support packed sequence for now.")
A related issue has said:
#3881
Verl mcore code: verl/models/mcore/util.py
cp_size = mpu.get_context_parallel_world_size()
assert cp_size == 1, "Context parallel size without seq_pack is not supported"