Skip to content

Adding Differential Diffusion to HunyuanDiT #8992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

MnCSSJ4x
Copy link
Contributor

@MnCSSJ4x MnCSSJ4x commented Jul 26, 2024

What does this PR do?

Adds Differential Diffusion to HunyuanDIT.

Fixes Partially 8924(HunyuanDiT Only)

Before submitting

How to test:

Gradient

import torch
from diffusers import FlowMatchEulerDiscreteScheduler
from diffusers.utils import load_image
from PIL import Image
from torchvision import transforms

from pipeline_hunyuandit_differential_img2img import (
    HunyuanDiTDifferentialImg2ImgPipeline,
)

pipe = HunyuanDiTDifferentialImg2ImgPipeline.from_pretrained(
    "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
).to("cuda")

def preprocess_image(image):
    image = image.convert("RGB")
    image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
    image = transforms.ToTensor()(image)
    image = image * 2 - 1
    image = image.unsqueeze(0).to("cuda")
    return image

def preprocess_map(map):
    map = map.convert("L")
    map = transforms.CenterCrop((map.size[1] // 64 * 64, map.size[0] // 64 * 64))(map)
    map = transforms.ToTensor()(map)
    map = map.to("cuda")
    return map

source_image = preprocess_image(load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png"
))
map = preprocess_map(load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask_2.png"
))
prompt = "a green pear"
negative_prompt = "blurry"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=7.5,
    num_inference_steps=25,
    original_image=source_image,
    image=source_image,
    strength=1.0,
    map=map,
).images[0]
Gradient Input Output
Gradient Input Output

A colab notebook demonstrating all results can be found here. Depth Maps have also been added in the same colab.

Who can review?

@a-r-r-o-w @DN6

@MnCSSJ4x MnCSSJ4x marked this pull request as ready for review July 26, 2024 17:26
@yiyixuxu yiyixuxu requested review from a-r-r-o-w and asomoza July 26, 2024 18:04
@asomoza
Copy link
Member

asomoza commented Jul 26, 2024

Hi, thanks for your contribution. Can you please add the changes we did in the SD3 pipeline so users don't have to preprocess the images outside of the pipeline, it also removes the need of torchvision.

Basically the changes in this commit.

Also the depth map seems to be working ok but for the apple/pear demo seems to be changing the whole image, I'll do some tests later.

@MnCSSJ4x
Copy link
Contributor Author

@asomoza noted. Was traveling and will make the updates soon.

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool, thanks! could you post some results for both img2img and differential img2img with the reproducible code?

in addition to @asomoza's requested changes, i think the addition of hunyuan img2img to core diffusers is extra, no? i will let @yiyixuxu make the final call if we need it.

the changes here seem correct to me from a quick look and cannot spot any obvious bugs but will wait for your post on results obtained before doing a deeper review and testing :)

@MnCSSJ4x
Copy link
Contributor Author

Hi @asomoza @a-r-r-o-w I tried to integrate the change by following the commit. Unfortunately, I am getting an error that I am unable to track. It mostly arises from line 725 - in this commit. The stack trace while running the same colab with changes in parameters is as below -

TypeError                                 Traceback (most recent call last)
[<ipython-input-5-3ecafcee8b09>](https://localhost:8080/#) in <cell line: 15>()
     13 negative_prompt = "blurry"
     14 
---> 15 image = pipe(
     16     prompt=prompt,
     17     negative_prompt=negative_prompt,

2 frames
[/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

[/content/pipeline_hunyuandit_differential_img2img.py](https://localhost:8080/#) in __call__(self, prompt, image, strength, height, width, num_inference_steps, timesteps, sigmas, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, prompt_embeds, prompt_embeds_2, negative_prompt_embeds, negative_prompt_embeds_2, prompt_attention_mask, prompt_attention_mask_2, negative_prompt_attention_mask, negative_prompt_attention_mask_2, output_type, return_dict, callback_on_step_end, callback_on_step_end_tensor_inputs, guidance_rescale, original_size, target_size, crops_coords_top_left, use_resolution_binning, map, denoising_start)
   1051         # 6. Prepare latent variables
   1052         num_channels_latents = self.transformer.config.in_channels
-> 1053         latents = self.prepare_latents(
   1054             batch_size * num_images_per_prompt,
   1055             num_channels_latents,

[/content/pipeline_hunyuandit_differential_img2img.py](https://localhost:8080/#) in prepare_latents(self, batch_size, num_channels_latents, height, width, image, timestep, dtype, device, generator)
    721             init_latents = retrieve_latents(self.vae.encode(image), generator=generator)
    722             init_latents = (
--> 723                 init_latents - self.vae.config.shift_factor
    724             ) * self.vae.config.scaling_factor
    725         if (

TypeError: unsupported operand type(s) for -: 'Tensor' and 'NoneType'

From my understanding, the vae cannot access the scaling_factor which is strange as the previous version of the code had similar lines and didn't cause any issues. I tried both the commits where the position of those operations is in or out of the inner else block.

@asomoza
Copy link
Member

asomoza commented Jul 29, 2024

Thanks, I added a comment with the error you're having.

I did a quick look and it seems right but I don't get good results with this model:

HunyuanDiT SD3
20240728235448_1702572588 20240628194831_606373744
20240729000200_403507049 20240624025754_594973430

This is the first time I see results like this with diff-diff.

ccing @exx8 for maybe some insights on why this happens, pretty much the only difference is the rotary embeddings. It can be just a limitation of the model though.

@MnCSSJ4x
Copy link
Contributor Author

@asomoza Thanks for the fix. The colab link (same) is working and has been updated.

@@ -70,7 +70,9 @@
except OptionalDependencyNotAvailable:
from .utils import dummy_pt_objects # noqa F403

_import_structure["utils.dummy_pt_objects"] = [name for name in dir(dummy_pt_objects) if not name.startswith("_")]
_import_structure["utils.dummy_pt_objects"] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were these changes made intentionally/manually? if so, it'll have to be reverted. we use ruff for formatting using this configuration. you can automatically fix most of these by running make style in the diffusers root directory, or manually reverting

@exx8
Copy link

exx8 commented Jul 29, 2024

Thanks, I added a comment with the error you're having.

I did a quick look and it seems right but I don't get good results with this model:

HunyuanDiT SD3
20240728235448_1702572588 20240628194831_606373744
20240729000200_403507049 20240624025754_594973430
This is the first time I see results like this with diff-diff.

ccing @exx8 for maybe some insights on why this happens, pretty much the only difference is the rotary embeddings. It can be just a limitation of the model though.

Upon reviewing the code, I did not find any significant issues. The only potential concern was the add_noise function, which uses the singular term "timestep" but is passed an array of multiple timesteps. However, this seems to be a false alarm.

I adjusted the guidance scale to 11 and obtained the following result:
generated_image

I do wonder if this is satisfying or not. This is a bit tricky to answer.

Side Note: When comparing different diffusion models, it is beneficial to experiment with various settings. Each model may interpret prompts, guidance scales, and change maps differently.

@asomoza
Copy link
Member

asomoza commented Jul 29, 2024

thanks @exx8 for your answer.

I do wonder if this is satisfying or not. This is a bit tricky to answer.

IMO is not that tricky when you evaluate for a real use scenario. In all the other implementations, when you use the pear/apple example and a gradient, what we can expect is an apple that gradually transforms into a pear, that's what the gradient should do when you're doing a pixel per pixel change.

This is also true for an inpainting example, you want this technique to be able to gradually inpaint the new part so the new part doesn't have seams and that you don't see a difference between the old part and the new part.

With this model this doesn't happen, I tried with a lot of combinations and it always have two behaviors:

  • Completely ignores the apple part (but sometimes changes the color)
  • It does a hard change between them, this is like just using regular inpainting with a normal model.

This reminds me when I did test diff-diff with comfyui, but in this case, its probably a limitation of the model.

Edit: To be more specific too, why does it generate a leaf in the part it shouldn't change in your example and in the OP.

@exx8
Copy link

exx8 commented Jul 29, 2024

thanks @exx8 for your answer.

I do wonder if this is satisfying or not. This is a bit tricky to answer.

IMO is not that tricky when you evaluate for a real use scenario. In all the other implementations, when you use the pear/apple example and a gradient, what we can expect is an apple that gradually transforms into a pear, that's what the gradient should do when you're doing a pixel per pixel change.

This is also true for an inpainting example, you want this technique to be able to gradually inpaint the new part so the new part doesn't have seams and that you don't see a difference between the old part and the new part.

With this model this doesn't happen, I tried with a lot of combinations and it always have two behaviors:

  • Completely ignores the apple part (but sometimes changes the color)
  • It does a hard change between them, this is like just using regular inpainting with a normal model.

This reminds me when I did test diff-diff with comfyui, but in this case, its probably a limitation of the model.

Edit: To be more specific too, why does it generate a leaf in the part it shouldn't change in your example and in the OP.

Thank you so much for your perspective. I do wonder if the model is just “opinionated” against the split fruit, or there is something else going there. A split fruit is in some regards very far from pictures that the model has been probably trained on.
I think it will be interesting to see what happen for example in a more realistic scenario, let say this picture:
image
I tried with the map in the example with some linear changes (map*0.7+0.3) and the prompt “snow”
generated_image
and "desert"
generated_image
It seems reasonable. What do you think?

Edit: I noticed a drastic improvement in the picture quality when enlarging the number of steps to 100
generated_image

@asomoza
Copy link
Member

asomoza commented Jul 29, 2024

@exx8 Thanks a lot, that pretty much resolves it, in your example it's clearly using the gradient as it should, so I agree with your opinion, the model probably doesn't know what to do with a split fruit which probably means that it's to rigid for some inpainting examples.

I noticed this to a degree with normal prompts, this model it's a lot better with landscapes than specific subjects or some more common scenarios like the dog in the apartment.

Thanks again for your insights.

@MnCSSJ4x
Copy link
Contributor Author

Hi @a-r-r-o-w @DN6 is the PR good to go or do I need to make anymore changes ?

@asomoza
Copy link
Member

asomoza commented Jul 30, 2024

@MnCSSJ4x if you keep the PR to just the community pipeline we can merge it faster, but since it includes img2img which is part of the core we will need to do more reviews, test images, docs and also it will need a test suite.

Maybe you can separate it in two PRs? if you want this one to be merged faster.

@a-r-r-o-w
Copy link
Member

@MnCSSJ4x We're on a bit of a tight schedule right now so apologies for not being available for reviews here. As Alvaro mentioned, and I would recommend, let's revert all changes but the community pipeline here. For the Hunyuan Img2Img pipeline, feel free to open another PR where we will get back soon.

To prepare for merging the community pipeline, you can add your name and contribution here: https://github.com/huggingface/diffusers/blob/main/examples/community/README.md That should be easy to merge very soon :)

@MnCSSJ4x
Copy link
Contributor Author

MnCSSJ4x commented Aug 1, 2024

@a-r-r-o-w @asomoza I have split both the parts of the code separately into 2 PRs.
PR #9040 handles the community pipeline and PR #9041 handles the img2img variant (still a draft, will add a colab which one can run and experiment). I had a doubt regarding the readme step for community pipeline where do I create a new entry in the table or its better under differential heading I put a subsection which discusses this variant as this just builds on top of differential diffusion.

@a-r-r-o-w
Copy link
Member

You can add your name to the bottom of the table. The table is in order of addition of pipelines mostly except for a few.

@a-r-r-o-w
Copy link
Member

Closing this in favour of #9040 and #9041. Thanks a lot for your continued insights and experiments @exx8 @asomoza!

@a-r-r-o-w a-r-r-o-w closed this Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants