Your Question
Training qwen3.5-27B in long context will result in OOM during backpropagation. Any plans to support CP for GDN layers?
What I've Tried
Training qwen3.5-27B in long context will result in OOM during backpropagation. Any plans to support CP for GDN layers?
Environment (if relevant)
- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:
Additional Context
No response
Pre-submission Checklist
Your Question
Training qwen3.5-27B in long context will result in OOM during backpropagation. Any plans to support CP for GDN layers?
What I've Tried
Training qwen3.5-27B in long context will result in OOM during backpropagation. Any plans to support CP for GDN layers?
Environment (if relevant)
Additional Context
No response
Pre-submission Checklist