-
Notifications
You must be signed in to change notification settings - Fork 418
feat: migrate deepseek batch split to nnx #2522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: migrate deepseek batch split to nnx #2522
Conversation
|
Manually adding pull ready here to trigger copybara so its easier for @jesselu-google to review/test |
|
@jesselu-google has approved and this LGTM |
88e0d16 to
5473e91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mesakhcienet. Generally LGTM. Could you collect before/after profiles from the training you ran so we can compare the HLOs?
Sure, is this the training profiles you mentioned sir? Feel free to tell me if there is any additional information to collect, thank you. |
Thanks @mesakhcienet. The profiles are a bit different than these logs. With the profiles, we can tell if the HLOs are the same before/after this change. Followed up with a separate thread offline on this |
5473e91 to
1f87365
Compare
1f87365 to
7825f79
Compare
Add Training Profiles from all the hostsbefore-tq6p After-tq6p |
Description
Migrate deepseek models of split batch option to use nnx module.
Tests
We use xpk to create tpu cluster and assign workload
Environment
Cluster
TPU type : v6e-32
Number of slices : 4
GKE version : 1.31.11-gke.1036000
Base Image : us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:jax0.7.0-rev1
Image
Build Image command :
Test command
Run Xpk command :
Log
Train diff before and after migration here
stepsargument sets from15to40): linkChecklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.