[Flux LoRA] fix issues in flux lora scripts #11111

linoytsaban · 2025-03-18T19:39:27Z

fix remaining pending issues from #10313, #9476 in Flux LoRA training scripts

verify optimizer is updating properly (transformer only ☑️, test encoder w/ clip ☑️, pivotal w/ clip, pivotal w/ clip & t5, ti) wip
accelerate error when running on multiple gpus
replace scheduler
log_validation with mixed precision
save intermediate embeddings when checkpointing enabled

code snippets and output examples:

for running log_validation with mixed precision-

import os
os.environ['MODEL_NAME'] = "black-forest-labs/FLUX.1-dev"
os.environ['DATASET_NAME'] = "dog"
os.environ['OUTPUT_DIR'] = "flux-test-1"

!accelerate launch train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$DATASET_NAME \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="bf16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --guidance_scale=1 \
  --gradient_accumulation_steps=1 \
  --optimizer="prodigy" \
  --learning_rate=1. \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --checkpointing_steps=250 \
  --validation_prompt="a photo of sks dog in a bucket"\
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

validation output at step 380:

HuggingFaceDocBuilderDev · 2025-03-18T19:48:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ra_advanced

luchaoqi · 2025-03-19T01:33:00Z

Hi @linoytsaban , thanks for this prompt fix!

I believe the accelerator would produce error with line here with textual inversion specifically following the blog here:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 2576, in <module>
[rank0]:     main(args)
[rank0]:   File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 2273, in main
[rank0]:     prompt_embeds, pooled_prompt_embeds, text_ids = encode_prompt(
[rank0]:                                                     ^^^^^^^^^^^^^^
[rank0]:   File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 1446, in encode_prompt
[rank0]:     dtype = text_encoders[0].dtype
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/playpen-nas-ssd/luchao/software/miniconda3/envs/diffuser/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: 'DistributedDataParallel' object has no attribute 'dtype'. Did you mean: 'type'?

Also Is it possible to verify if textual inversion works in sks dog case on your end as well? e.g. pure CLIP textual inversion as mentioned here

  --train_text_encoder_ti \
  --train_text_encoder_ti_frac=1 \
  --train_transformer_frac=0

linoytsaban · 2025-03-19T06:46:50Z

hey @luchaoqi! yes I'm currently testing multiple configurations - will definitely test with pivotal tuning with clip and pure textual inversion with clip.

re: error with accelerator when running with multiple processes - adding it to the todo list :)

sayakpaul

Initial comments.

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

examples/dreambooth/train_dreambooth_lora_flux.py

…n, fix accelerator.accumulate call in advanced script

linoytsaban · 2025-03-19T08:21:42Z

@sayakpaul I noticed in the scripts some times we use: accelerator.unwrap_model, and sometimes we use unwrap_model

    def unwrap_model(model):
        model = accelerator.unwrap_model(model)
        model = model._orig_mod if is_compiled_module(model) else model
        return model

do you recall why it's not consistently one way or the other?

sayakpaul · 2025-03-19T09:29:21Z

Using the unwrap_model() function works. The ones that doesn't should be updated to have something similar. We added unwrap_model() to have more consistency for cases with torch.compile()

….unwrap_model with unwrap model

…dvanced script

linoytsaban · 2025-03-19T10:52:57Z

Hey @luchaoqi! could you please check if the accelerator now works fine with distributed training? I think it should be resolved now

luchaoqi · 2025-03-19T14:25:06Z

Hi @linoytsaban, yes distributed training works as expected.

Pure textual inversion pops up new problems:

03/19/2025 10:20:03 - INFO - __main__ - Running validation...
 Generating 4 images with prompt: a photo of <s0><s1> person at 50 years old.
Traceback (most recent call last):
  File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 2408, in <module>
    main(args)
  File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 2055, in main
    text_encoder_one.train()
    ^^^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'text_encoder_one' where it is not associated with a value
[rank0]: Traceback (most recent call last):
[rank0]:   File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 2408, in <module>
[rank0]:     main(args)
[rank0]:   File "/playpen-nas-ssd/luchao/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced_linoy.py", line 2055, in main
[rank0]:     text_encoder_one.train()
[rank0]:     ^^^^^^^^^^^^^^^^
[rank0]: UnboundLocalError: cannot access local variable 'text_encoder_one' where it is not associated with a value

luchaoqi · 2025-03-25T21:23:43Z

Hi @linoytsaban, just wanted to follow up on the textual inversion part—do you anticipate it being fixed soon, or will it need a bit more time?

linoytsaban · 2025-03-27T07:58:23Z

@luchaoqi yes should be done soon!

linoytsaban · 2025-04-04T12:54:03Z

@luchaoqi if you want to give it a try the current version should be fixed

luchaoqi · 2025-04-04T20:20:21Z

@linoytsaban thanks! Would definitely try it out asap once I get some time.
Feel free to merge it if other reviewers agree, cheers!

sayakpaul

Thanks! Left some comments. Let me know if they make sense.

sayakpaul · 2025-04-08T10:10:06Z

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

@@ -228,10 +228,21 @@ def log_validation(

    # run inference
    generator = torch.Generator(device=accelerator.device).manual_seed(args.seed) if args.seed is not None else None
-    autocast_ctx = nullcontext()
+    autocast_ctx = torch.autocast(accelerator.device.type)


I think this is only needed for the intermediate validation. Do we need to check for that?

Yeah I think you're right, tested it now and seems to work as expected, changed it now

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

examples/dreambooth/train_dreambooth_flux.py

examples/dreambooth/train_dreambooth_lora_flux.py

…x_advanced.py Co-authored-by: Sayak Paul <[email protected]>

sayakpaul

Thanks a ton for handling this!

sayakpaul · 2025-04-08T14:08:04Z

@bot /style

github-actions · 2025-04-08T14:08:58Z

Style fixes have been applied. View the workflow run here.

linoytsaban · 2025-04-08T14:40:20Z

Failing test is unrelated

linoytsaban added 2 commits March 18, 2025 21:05

remove custom scheduler

298709e

update requirements.txt

364f478

log_validation with mixed precision

90e9517

linoytsaban changed the title ~~[Flux LoRA] fix issues in advanced script~~ [Flux LoRA] fix issues in flux lora scripts Mar 18, 2025

linoytsaban and others added 4 commits March 18, 2025 23:09

Merge branch 'main' into flux_lora_advanced

d90d7f0

add intermediate embeddings saving when checkpointing is enabled

bdd6cae

remove comment

c8e165b

Merge remote-tracking branch 'origin/flux_lora_advanced' into flux_lo…

d434db3

…ra_advanced

linoytsaban requested a review from sayakpaul March 18, 2025 21:40

linoytsaban added bug Something isn't working training labels Mar 18, 2025

fix validation

710fcae

Merge branch 'main' into flux_lora_advanced

2d8ca60

sayakpaul reviewed Mar 19, 2025

View reviewed changes

add unwrap_model for accelerator, torch.no_grad context for validatio…

0565932

…n, fix accelerator.accumulate call in advanced script

revert unwrap_model change temp

ba4dece

linoytsaban added 3 commits March 19, 2025 12:38

add .module to address distributed training bug + replace accelerator…

c155f22

….unwrap_model with unwrap model

changes to align advanced script with canonical script

9c4368d

make changes for distributed training + unify unwrap_model calls in a…

7492e92

…dvanced script

linoytsaban and others added 3 commits March 19, 2025 12:57

add module.dtype fix to dreambooth script

0729c66

unify unwrap_model calls in dreambooth script

cc1d2ad

Merge branch 'main' into flux_lora_advanced

07c2974

Merge branch 'main' into flux_lora_advanced

603b57c

Merge branch 'main' into flux_lora_advanced

e5636f0

fix condition in validation run

8bf49c7

linoytsaban marked this pull request as ready for review April 2, 2025 05:28

linoytsaban and others added 5 commits April 2, 2025 08:28

Merge branch 'main' into flux_lora_advanced

b211eea

mixed precision

9b2917f

Merge branch 'main' into flux_lora_advanced

22046d1

Merge branch 'main' into flux_lora_advanced

f1af7e2

Merge branch 'main' into flux_lora_advanced

8dc7005

Merge branch 'main' into flux_lora_advanced

d8ef75f

linoytsaban requested a review from sayakpaul April 8, 2025 08:31

Merge branch 'main' into flux_lora_advanced

bfd8d45

sayakpaul reviewed Apr 8, 2025

View reviewed changes

linoytsaban and others added 4 commits April 8, 2025 14:45

Update examples/advanced_diffusion_training/train_dreambooth_lora_flu…

5d249a7

…x_advanced.py Co-authored-by: Sayak Paul <[email protected]>

Merge branch 'main' into flux_lora_advanced

a4b1e7f

smol style change

57ee3cf

change autocast

8b991a5

linoytsaban requested a review from sayakpaul April 8, 2025 12:26

sayakpaul approved these changes Apr 8, 2025

View reviewed changes

Apply style fixes

bfd1df6

Merge branch 'main' into flux_lora_advanced

c978ca3

linoytsaban merged commit 71f34fc into huggingface:main Apr 8, 2025
8 of 9 checks passed

linoytsaban deleted the flux_lora_advanced branch April 9, 2025 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flux LoRA] fix issues in flux lora scripts #11111

[Flux LoRA] fix issues in flux lora scripts #11111

linoytsaban commented Mar 18, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 18, 2025

luchaoqi commented Mar 19, 2025 •

edited

Loading

linoytsaban commented Mar 19, 2025

sayakpaul left a comment

linoytsaban commented Mar 19, 2025

sayakpaul commented Mar 19, 2025

linoytsaban commented Mar 19, 2025

luchaoqi commented Mar 19, 2025

luchaoqi commented Mar 25, 2025

linoytsaban commented Mar 27, 2025

linoytsaban commented Apr 4, 2025

luchaoqi commented Apr 4, 2025

sayakpaul left a comment

sayakpaul Apr 8, 2025

linoytsaban Apr 8, 2025 •

edited

Loading

sayakpaul left a comment

sayakpaul commented Apr 8, 2025

github-actions bot commented Apr 8, 2025

linoytsaban commented Apr 8, 2025

[Flux LoRA] fix issues in flux lora scripts #11111

[Flux LoRA] fix issues in flux lora scripts #11111

Conversation

linoytsaban commented Mar 18, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Mar 18, 2025

luchaoqi commented Mar 19, 2025 • edited Loading

linoytsaban commented Mar 19, 2025

sayakpaul left a comment

Choose a reason for hiding this comment

linoytsaban commented Mar 19, 2025

sayakpaul commented Mar 19, 2025

linoytsaban commented Mar 19, 2025

luchaoqi commented Mar 19, 2025

luchaoqi commented Mar 25, 2025

linoytsaban commented Mar 27, 2025

linoytsaban commented Apr 4, 2025

luchaoqi commented Apr 4, 2025

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Apr 8, 2025

Choose a reason for hiding this comment

linoytsaban Apr 8, 2025 • edited Loading

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul commented Apr 8, 2025

github-actions bot commented Apr 8, 2025

linoytsaban commented Apr 8, 2025

linoytsaban commented Mar 18, 2025 •

edited

Loading

luchaoqi commented Mar 19, 2025 •

edited

Loading

linoytsaban Apr 8, 2025 •

edited

Loading