Skip to content

[WIP][DeepSeek] DeepSeek training and component integration with Titan main components #1183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

lessw2020
Copy link
Contributor

@lessw2020 lessw2020 commented May 13, 2025

Mostly publishing for our status updates, but this PR:
a - starts integration of Deepseek training loop with Torch Titan main components
b - refactors deepseek modeling to start using Titan componenets such as .toml for model config and job config.
c - modularizes deepseek modeling overall.
d - moves all group_gemm components into kernels so that they can be leveraged by other models (including DSGemm)
Screenshot 2025-05-14 at 8 26 47 PM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 13, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's put this under deepseek folder, given its dependency on transformers. We can consider upstreaming it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good - thanks for the feedback.

Comment on lines +344 to +346
expert_parallel_degree: int = 1
"""Expert parallelism degree. 1 means disabled."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only for MoE-based models, how about let's use https://github.com/pytorch/torchtitan/blob/main/docs/extension.md#extending-jobconfig
and create a separate .py config file in the deepseek folder. Later we can see if we can reuse them for Llama 4 and DeepSeek.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants