[Question]  Any plans to support  CP for GDN layers?

### Your Question

Training qwen3.5-27B in long context  will result in OOM during backpropagation. Any plans to support  CP for GDN layers?

### What I've Tried

Training qwen3.5-27B in long context  will result in OOM during backpropagation. Any plans to support  CP for GDN layers?

### Environment (if relevant)

- slime version:
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:


### Additional Context

_No response_

### Pre-submission Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/THUDM/slime/blob/main/CONTRIBUTING.md) and understand the collaboration scope.
- [x] I have read the [documentation](https://thudm.github.io/slime/) and [FAQ](https://thudm.github.io/slime/en/get_started/qa.html) and my question is not answered there.
- [x] I have searched for [existing issues](https://github.com/THUDM/slime/issues) and my question has not been asked before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Any plans to support CP for GDN layers? #1744

Your Question

What I've Tried

Environment (if relevant)

Additional Context

Pre-submission Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Any plans to support CP for GDN layers? #1744

Description

Your Question

What I've Tried

Environment (if relevant)

Additional Context

Pre-submission Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions