-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flux LoRA] fix issues in flux lora scripts #11111
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @linoytsaban , thanks for this prompt fix! I believe the accelerator would produce error with line here with textual inversion specifically following the blog here:
Also Is it possible to verify if textual inversion works in
|
hey @luchaoqi! yes I'm currently testing multiple configurations - will definitely test with pivotal tuning with clip and pure textual inversion with clip. re: error with accelerator when running with multiple processes - adding it to the todo list :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments.
examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Outdated
Show resolved
Hide resolved
examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Show resolved
Hide resolved
…n, fix accelerator.accumulate call in advanced script
@sayakpaul I noticed in the scripts some times we use:
do you recall why it's not consistently one way or the other? |
Using the |
….unwrap_model with unwrap model
Hey @luchaoqi! could you please check if the accelerator now works fine with distributed training? I think it should be resolved now |
Hi @linoytsaban, yes distributed training works as expected. Pure textual inversion pops up new problems:
|
Hi @linoytsaban, just wanted to follow up on the textual inversion part—do you anticipate it being fixed soon, or will it need a bit more time? |
@luchaoqi yes should be done soon! |
@luchaoqi if you want to give it a try the current version should be fixed |
@linoytsaban thanks! Would definitely try it out asap once I get some time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some comments. Let me know if they make sense.
@@ -228,10 +228,21 @@ def log_validation( | |||
|
|||
# run inference | |||
generator = torch.Generator(device=accelerator.device).manual_seed(args.seed) if args.seed is not None else None | |||
autocast_ctx = nullcontext() | |||
autocast_ctx = torch.autocast(accelerator.device.type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only needed for the intermediate validation. Do we need to check for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think you're right, tested it now and seems to work as expected, changed it now
examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Outdated
Show resolved
Hide resolved
examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Show resolved
Hide resolved
examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Show resolved
Hide resolved
examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Show resolved
Hide resolved
…x_advanced.py Co-authored-by: Sayak Paul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for handling this!
@bot /style |
Style fixes have been applied. View the workflow run here. |
Failing test is unrelated |
fix remaining pending issues from #10313, #9476 in Flux LoRA training scripts
code snippets and output examples:
log_validation
with mixed precision-validation output at step 380:
