-
Notifications
You must be signed in to change notification settings - Fork 6k
Adding Differential Diffusion to HunyuanDiT #8992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, thanks for your contribution. Can you please add the changes we did in the SD3 pipeline so users don't have to preprocess the images outside of the pipeline, it also removes the need of torchvision. Basically the changes in this commit. Also the depth map seems to be working ok but for the apple/pear demo seems to be changing the whole image, I'll do some tests later. |
@asomoza noted. Was traveling and will make the updates soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool, thanks! could you post some results for both img2img and differential img2img with the reproducible code?
in addition to @asomoza's requested changes, i think the addition of hunyuan img2img to core diffusers is extra, no? i will let @yiyixuxu make the final call if we need it.
the changes here seem correct to me from a quick look and cannot spot any obvious bugs but will wait for your post on results obtained before doing a deeper review and testing :)
Hi @asomoza @a-r-r-o-w I tried to integrate the change by following the commit. Unfortunately, I am getting an error that I am unable to track. It mostly arises from line 725 - in this commit. The stack trace while running the same colab with changes in parameters is as below - TypeError Traceback (most recent call last)
[<ipython-input-5-3ecafcee8b09>](https://localhost:8080/#) in <cell line: 15>()
13 negative_prompt = "blurry"
14
---> 15 image = pipe(
16 prompt=prompt,
17 negative_prompt=negative_prompt,
2 frames
[/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
116
117 return decorate_context
[/content/pipeline_hunyuandit_differential_img2img.py](https://localhost:8080/#) in __call__(self, prompt, image, strength, height, width, num_inference_steps, timesteps, sigmas, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, prompt_embeds, prompt_embeds_2, negative_prompt_embeds, negative_prompt_embeds_2, prompt_attention_mask, prompt_attention_mask_2, negative_prompt_attention_mask, negative_prompt_attention_mask_2, output_type, return_dict, callback_on_step_end, callback_on_step_end_tensor_inputs, guidance_rescale, original_size, target_size, crops_coords_top_left, use_resolution_binning, map, denoising_start)
1051 # 6. Prepare latent variables
1052 num_channels_latents = self.transformer.config.in_channels
-> 1053 latents = self.prepare_latents(
1054 batch_size * num_images_per_prompt,
1055 num_channels_latents,
[/content/pipeline_hunyuandit_differential_img2img.py](https://localhost:8080/#) in prepare_latents(self, batch_size, num_channels_latents, height, width, image, timestep, dtype, device, generator)
721 init_latents = retrieve_latents(self.vae.encode(image), generator=generator)
722 init_latents = (
--> 723 init_latents - self.vae.config.shift_factor
724 ) * self.vae.config.scaling_factor
725 if (
TypeError: unsupported operand type(s) for -: 'Tensor' and 'NoneType' From my understanding, the vae cannot access the scaling_factor which is strange as the previous version of the code had similar lines and didn't cause any issues. I tried both the commits where the position of those operations is in or out of the inner else block. |
Thanks, I added a comment with the error you're having. I did a quick look and it seems right but I don't get good results with this model:
This is the first time I see results like this with diff-diff. ccing @exx8 for maybe some insights on why this happens, pretty much the only difference is the rotary embeddings. It can be just a limitation of the model though. |
Co-authored-by: Álvaro Somoza <[email protected]>
@asomoza Thanks for the fix. The colab link (same) is working and has been updated. |
src/diffusers/__init__.py
Outdated
@@ -70,7 +70,9 @@ | |||
except OptionalDependencyNotAvailable: | |||
from .utils import dummy_pt_objects # noqa F403 | |||
|
|||
_import_structure["utils.dummy_pt_objects"] = [name for name in dir(dummy_pt_objects) if not name.startswith("_")] | |||
_import_structure["utils.dummy_pt_objects"] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon reviewing the code, I did not find any significant issues. The only potential concern was the add_noise function, which uses the singular term "timestep" but is passed an array of multiple timesteps. However, this seems to be a false alarm. I adjusted the guidance scale to 11 and obtained the following result: I do wonder if this is satisfying or not. This is a bit tricky to answer. Side Note: When comparing different diffusion models, it is beneficial to experiment with various settings. Each model may interpret prompts, guidance scales, and change maps differently. |
thanks @exx8 for your answer.
IMO is not that tricky when you evaluate for a real use scenario. In all the other implementations, when you use the pear/apple example and a gradient, what we can expect is an apple that gradually transforms into a pear, that's what the gradient should do when you're doing a pixel per pixel change. This is also true for an inpainting example, you want this technique to be able to gradually inpaint the new part so the new part doesn't have seams and that you don't see a difference between the old part and the new part. With this model this doesn't happen, I tried with a lot of combinations and it always have two behaviors:
This reminds me when I did test diff-diff with comfyui, but in this case, its probably a limitation of the model. Edit: To be more specific too, why does it generate a leaf in the part it shouldn't change in your example and in the OP. |
Thank you so much for your perspective. I do wonder if the model is just “opinionated” against the split fruit, or there is something else going there. A split fruit is in some regards very far from pictures that the model has been probably trained on. Edit: I noticed a drastic improvement in the picture quality when enlarging the number of steps to 100 |
@exx8 Thanks a lot, that pretty much resolves it, in your example it's clearly using the gradient as it should, so I agree with your opinion, the model probably doesn't know what to do with a split fruit which probably means that it's to rigid for some inpainting examples. I noticed this to a degree with normal prompts, this model it's a lot better with landscapes than specific subjects or some more common scenarios like the dog in the apartment. Thanks again for your insights. |
Hi @a-r-r-o-w @DN6 is the PR good to go or do I need to make anymore changes ? |
@MnCSSJ4x if you keep the PR to just the community pipeline we can merge it faster, but since it includes img2img which is part of the core we will need to do more reviews, test images, docs and also it will need a test suite. Maybe you can separate it in two PRs? if you want this one to be merged faster. |
@MnCSSJ4x We're on a bit of a tight schedule right now so apologies for not being available for reviews here. As Alvaro mentioned, and I would recommend, let's revert all changes but the community pipeline here. For the Hunyuan Img2Img pipeline, feel free to open another PR where we will get back soon. To prepare for merging the community pipeline, you can add your name and contribution here: https://github.com/huggingface/diffusers/blob/main/examples/community/README.md That should be easy to merge very soon :) |
@a-r-r-o-w @asomoza I have split both the parts of the code separately into 2 PRs. |
You can add your name to the bottom of the table. The table is in order of addition of pipelines mostly except for a few. diffusers/examples/community/README.md Line 74 in 27637a5 |
What does this PR do?
Adds Differential Diffusion to HunyuanDIT.
Fixes Partially 8924(HunyuanDiT Only)
Before submitting
How to test:
Gradient
A colab notebook demonstrating all results can be found here. Depth Maps have also been added in the same colab.
Who can review?
@a-r-r-o-w @DN6