-
Notifications
You must be signed in to change notification settings - Fork 379
[WIP][DeepSeek] DeepSeek training and component integration with Titan main components #1183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's put this under deepseek folder, given its dependency on transformers
. We can consider upstreaming it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good - thanks for the feedback.
expert_parallel_degree: int = 1 | ||
"""Expert parallelism degree. 1 means disabled.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only for MoE-based models, how about let's use https://github.com/pytorch/torchtitan/blob/main/docs/extension.md#extending-jobconfig
and create a separate .py
config file in the deepseek folder. Later we can see if we can reuse them for Llama 4 and DeepSeek.
Mostly publishing for our status updates, but this PR:

a - starts integration of Deepseek training loop with Torch Titan main components
b - refactors deepseek modeling to start using Titan componenets such as .toml for model config and job config.
c - modularizes deepseek modeling overall.
d - moves all group_gemm components into kernels so that they can be leveraged by other models (including DSGemm)