Skip to content

[RFC] Support context parallelism #217

@sustcsonglin

Description

@sustcsonglin

Proposal

support context parallelism for all linear attention models

Rationale

One of the major advantages of linear attention is that it enables long sequence modeling. However, for training and prefilling, a single GPU will often lack sufficient memory to process the entire input, making context parallelism essential.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions