Adding Differential Diffusion to HunyuanDiT #8992

MnCSSJ4x · 2024-07-26T16:21:22Z

What does this PR do?

Adds Differential Diffusion to HunyuanDIT.

Fixes Partially 8924(HunyuanDiT Only)

Before submitting

Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?

How to test:

Gradient

import torch
from diffusers import FlowMatchEulerDiscreteScheduler
from diffusers.utils import load_image
from PIL import Image
from torchvision import transforms

from pipeline_hunyuandit_differential_img2img import (
    HunyuanDiTDifferentialImg2ImgPipeline,
)

pipe = HunyuanDiTDifferentialImg2ImgPipeline.from_pretrained(
    "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
).to("cuda")

def preprocess_image(image):
    image = image.convert("RGB")
    image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
    image = transforms.ToTensor()(image)
    image = image * 2 - 1
    image = image.unsqueeze(0).to("cuda")
    return image

def preprocess_map(map):
    map = map.convert("L")
    map = transforms.CenterCrop((map.size[1] // 64 * 64, map.size[0] // 64 * 64))(map)
    map = transforms.ToTensor()(map)
    map = map.to("cuda")
    return map

source_image = preprocess_image(load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png"
))
map = preprocess_map(load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask_2.png"
))
prompt = "a green pear"
negative_prompt = "blurry"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=7.5,
    num_inference_steps=25,
    original_image=source_image,
    image=source_image,
    strength=1.0,
    map=map,
).images[0]


Gradient	Input	Output

A colab notebook demonstrating all results can be found here. Depth Maps have also been added in the same colab.

Who can review?

@a-r-r-o-w @DN6

asomoza · 2024-07-26T18:54:13Z

Hi, thanks for your contribution. Can you please add the changes we did in the SD3 pipeline so users don't have to preprocess the images outside of the pipeline, it also removes the need of torchvision.

Basically the changes in this commit.

Also the depth map seems to be working ok but for the apple/pear demo seems to be changing the whole image, I'll do some tests later.

MnCSSJ4x · 2024-07-27T12:35:43Z

@asomoza noted. Was traveling and will make the updates soon.

a-r-r-o-w

very cool, thanks! could you post some results for both img2img and differential img2img with the reproducible code?

in addition to @asomoza's requested changes, i think the addition of hunyuan img2img to core diffusers is extra, no? i will let @yiyixuxu make the final call if we need it.

the changes here seem correct to me from a quick look and cannot spot any obvious bugs but will wait for your post on results obtained before doing a deeper review and testing :)

MnCSSJ4x · 2024-07-28T17:10:05Z

Hi @asomoza @a-r-r-o-w I tried to integrate the change by following the commit. Unfortunately, I am getting an error that I am unable to track. It mostly arises from line 725 - in this commit. The stack trace while running the same colab with changes in parameters is as below -

TypeError                                 Traceback (most recent call last)
[<ipython-input-5-3ecafcee8b09>](https://localhost:8080/#) in <cell line: 15>()
     13 negative_prompt = "blurry"
     14 
---> 15 image = pipe(
     16     prompt=prompt,
     17     negative_prompt=negative_prompt,

2 frames
[/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

[/content/pipeline_hunyuandit_differential_img2img.py](https://localhost:8080/#) in __call__(self, prompt, image, strength, height, width, num_inference_steps, timesteps, sigmas, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, prompt_embeds, prompt_embeds_2, negative_prompt_embeds, negative_prompt_embeds_2, prompt_attention_mask, prompt_attention_mask_2, negative_prompt_attention_mask, negative_prompt_attention_mask_2, output_type, return_dict, callback_on_step_end, callback_on_step_end_tensor_inputs, guidance_rescale, original_size, target_size, crops_coords_top_left, use_resolution_binning, map, denoising_start)
   1051         # 6. Prepare latent variables
   1052         num_channels_latents = self.transformer.config.in_channels
-> 1053         latents = self.prepare_latents(
   1054             batch_size * num_images_per_prompt,
   1055             num_channels_latents,

[/content/pipeline_hunyuandit_differential_img2img.py](https://localhost:8080/#) in prepare_latents(self, batch_size, num_channels_latents, height, width, image, timestep, dtype, device, generator)
    721             init_latents = retrieve_latents(self.vae.encode(image), generator=generator)
    722             init_latents = (
--> 723                 init_latents - self.vae.config.shift_factor
    724             ) * self.vae.config.scaling_factor
    725         if (

TypeError: unsupported operand type(s) for -: 'Tensor' and 'NoneType'

From my understanding, the vae cannot access the scaling_factor which is strange as the previous version of the code had similar lines and didn't cause any issues. I tried both the commits where the position of those operations is in or out of the inner else block.

examples/community/pipeline_hunyuandit_differential_img2img.py

asomoza · 2024-07-29T04:18:08Z

Thanks, I added a comment with the error you're having.

I did a quick look and it seems right but I don't get good results with this model:

HunyuanDiT	SD3

This is the first time I see results like this with diff-diff.

ccing @exx8 for maybe some insights on why this happens, pretty much the only difference is the rotary embeddings. It can be just a limitation of the model though.

Co-authored-by: Álvaro Somoza <[email protected]>

MnCSSJ4x · 2024-07-29T05:24:34Z

@asomoza Thanks for the fix. The colab link (same) is working and has been updated.

a-r-r-o-w · 2024-07-29T05:37:11Z

src/diffusers/__init__.py

@@ -70,7 +70,9 @@
 except OptionalDependencyNotAvailable:
    from .utils import dummy_pt_objects  # noqa F403

-    _import_structure["utils.dummy_pt_objects"] = [name for name in dir(dummy_pt_objects) if not name.startswith("_")]
+    _import_structure["utils.dummy_pt_objects"] = [


were these changes made intentionally/manually? if so, it'll have to be reverted. we use ruff for formatting using this configuration. you can automatically fix most of these by running make style in the diffusers root directory, or manually reverting

src/diffusers/__init__.py

exx8 · 2024-07-29T12:34:54Z

Thanks, I added a comment with the error you're having.

I did a quick look and it seems right but I don't get good results with this model:

HunyuanDiT SD3

This is the first time I see results like this with diff-diff.

ccing @exx8 for maybe some insights on why this happens, pretty much the only difference is the rotary embeddings. It can be just a limitation of the model though.

Upon reviewing the code, I did not find any significant issues. The only potential concern was the add_noise function, which uses the singular term "timestep" but is passed an array of multiple timesteps. However, this seems to be a false alarm.

I adjusted the guidance scale to 11 and obtained the following result:

I do wonder if this is satisfying or not. This is a bit tricky to answer.

Side Note: When comparing different diffusion models, it is beneficial to experiment with various settings. Each model may interpret prompts, guidance scales, and change maps differently.

asomoza · 2024-07-29T22:00:38Z

thanks @exx8 for your answer.

I do wonder if this is satisfying or not. This is a bit tricky to answer.

IMO is not that tricky when you evaluate for a real use scenario. In all the other implementations, when you use the pear/apple example and a gradient, what we can expect is an apple that gradually transforms into a pear, that's what the gradient should do when you're doing a pixel per pixel change.

This is also true for an inpainting example, you want this technique to be able to gradually inpaint the new part so the new part doesn't have seams and that you don't see a difference between the old part and the new part.

With this model this doesn't happen, I tried with a lot of combinations and it always have two behaviors:

Completely ignores the apple part (but sometimes changes the color)
It does a hard change between them, this is like just using regular inpainting with a normal model.

This reminds me when I did test diff-diff with comfyui, but in this case, its probably a limitation of the model.

Edit: To be more specific too, why does it generate a leaf in the part it shouldn't change in your example and in the OP.

exx8 · 2024-07-29T23:08:02Z

thanks @exx8 for your answer.

I do wonder if this is satisfying or not. This is a bit tricky to answer.

IMO is not that tricky when you evaluate for a real use scenario. In all the other implementations, when you use the pear/apple example and a gradient, what we can expect is an apple that gradually transforms into a pear, that's what the gradient should do when you're doing a pixel per pixel change.

This is also true for an inpainting example, you want this technique to be able to gradually inpaint the new part so the new part doesn't have seams and that you don't see a difference between the old part and the new part.

With this model this doesn't happen, I tried with a lot of combinations and it always have two behaviors:

Completely ignores the apple part (but sometimes changes the color)

It does a hard change between them, this is like just using regular inpainting with a normal model.

This reminds me when I did test diff-diff with comfyui, but in this case, its probably a limitation of the model.

Edit: To be more specific too, why does it generate a leaf in the part it shouldn't change in your example and in the OP.

Thank you so much for your perspective. I do wonder if the model is just “opinionated” against the split fruit, or there is something else going there. A split fruit is in some regards very far from pictures that the model has been probably trained on.
I think it will be interesting to see what happen for example in a more realistic scenario, let say this picture:

I tried with the map in the example with some linear changes (map*0.7+0.3) and the prompt “snow”

and "desert"

It seems reasonable. What do you think?

Edit: I noticed a drastic improvement in the picture quality when enlarging the number of steps to 100

asomoza · 2024-07-29T23:34:58Z

@exx8 Thanks a lot, that pretty much resolves it, in your example it's clearly using the gradient as it should, so I agree with your opinion, the model probably doesn't know what to do with a split fruit which probably means that it's to rigid for some inpainting examples.

I noticed this to a degree with normal prompts, this model it's a lot better with landscapes than specific subjects or some more common scenarios like the dog in the apartment.

Thanks again for your insights.

MnCSSJ4x · 2024-07-30T07:40:11Z

Hi @a-r-r-o-w @DN6 is the PR good to go or do I need to make anymore changes ?

asomoza · 2024-07-30T21:05:03Z

@MnCSSJ4x if you keep the PR to just the community pipeline we can merge it faster, but since it includes img2img which is part of the core we will need to do more reviews, test images, docs and also it will need a test suite.

Maybe you can separate it in two PRs? if you want this one to be merged faster.

a-r-r-o-w · 2024-07-30T22:07:54Z

@MnCSSJ4x We're on a bit of a tight schedule right now so apologies for not being available for reviews here. As Alvaro mentioned, and I would recommend, let's revert all changes but the community pipeline here. For the Hunyuan Img2Img pipeline, feel free to open another PR where we will get back soon.

To prepare for merging the community pipeline, you can add your name and contribution here: https://github.com/huggingface/diffusers/blob/main/examples/community/README.md That should be easy to merge very soon :)

MnCSSJ4x · 2024-08-01T13:39:35Z

@a-r-r-o-w @asomoza I have split both the parts of the code separately into 2 PRs.
PR #9040 handles the community pipeline and PR #9041 handles the img2img variant (still a draft, will add a colab which one can run and experiment). I had a doubt regarding the readme step for community pipeline where do I create a new entry in the table or its better under differential heading I put a subsection which discusses this variant as this just builds on top of differential diffusion.

a-r-r-o-w · 2024-08-01T22:23:33Z

You can add your name to the bottom of the table. The table is in order of addition of pipelines mostly except for a few.

diffusers/examples/community/README.md

Line 74 in 27637a5

a-r-r-o-w · 2024-08-04T12:02:32Z

Closing this in favour of #9040 and #9041. Thanks a lot for your continued insights and experiments @exx8 @asomoza!

MnCSSJ4x added 9 commits July 25, 2024 22:32

Add pipeline_hunyuandit_img2img.py

eacc4eb

Fix imports in hunyuandit diffuser

081c479

Edit init to include new pipeline

5baced8

Add import statements for hunyuandit diffusers

f48280d

Add timesteps parameter to __call__

446bd06

Add Differential to HunyuanDITImg2Img

e4446c4

Add map parameter.

8ea27bf

add map preprocessing

27b0e82

fixed import issues

fbd8b92

MnCSSJ4x marked this pull request as ready for review July 26, 2024 17:26

yiyixuxu requested review from a-r-r-o-w and asomoza July 26, 2024 18:04

a-r-r-o-w reviewed Jul 28, 2024

View reviewed changes

MnCSSJ4x added 2 commits July 28, 2024 21:53

Removed preprocessing requirement.

7018412

Attempt to fix self.vae.config.shift_factor.

bc61647

asomoza reviewed Jul 29, 2024

View reviewed changes

examples/community/pipeline_hunyuandit_differential_img2img.py Outdated Show resolved Hide resolved

MnCSSJ4x and others added 2 commits July 29, 2024 10:24

Fix shift factor issue.

454cd33

Co-authored-by: Álvaro Somoza <[email protected]>

Fix indentation for operation for init_latents

f2906ac

a-r-r-o-w reviewed Jul 29, 2024

View reviewed changes

Fix formatting issues.

61a6ac2

exx8 mentioned this pull request Jul 30, 2024

Adding Differential Diffusion to Kolors, Auraflow, HunyuanDiT #8924

Closed

3 tasks

MnCSSJ4x mentioned this pull request Aug 1, 2024

Adds Img2Img2 Variant for HunyuanDiT. #9041

Open

2 tasks

a-r-r-o-w closed this Aug 4, 2024

Adding Differential Diffusion to HunyuanDiT #8992

Adding Differential Diffusion to HunyuanDiT #8992

Uh oh!

Conversation

MnCSSJ4x commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Gradient

Who can review?

Uh oh!

asomoza commented Jul 26, 2024

Uh oh!

MnCSSJ4x commented Jul 27, 2024

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

MnCSSJ4x commented Jul 28, 2024

Uh oh!

Uh oh!

asomoza commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MnCSSJ4x commented Jul 29, 2024

Uh oh!

a-r-r-o-w Jul 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

exx8 commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

exx8 commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Jul 29, 2024

Uh oh!

MnCSSJ4x commented Jul 30, 2024

Uh oh!

asomoza commented Jul 30, 2024

Uh oh!

a-r-r-o-w commented Jul 30, 2024

Uh oh!

MnCSSJ4x commented Aug 1, 2024

Uh oh!

a-r-r-o-w commented Aug 1, 2024

Uh oh!

a-r-r-o-w commented Aug 4, 2024

Uh oh!

Uh oh!

MnCSSJ4x commented Jul 26, 2024 •

edited

Loading

asomoza commented Jul 29, 2024 •

edited

Loading

exx8 commented Jul 29, 2024 •

edited

Loading

asomoza commented Jul 29, 2024 •

edited

Loading

exx8 commented Jul 29, 2024 •

edited

Loading