I2VGenXLPipeline - missing components? #7952
Replies: 3 comments 7 replies
-
cc @sayakpaul |
Beta Was this translation helpful? Give feedback.
-
Thanks for bringing this to our attention. We followed the original implementation code and ensured the same outputs could be obtained from the two implementations.
I think there's a misunderstanding here. LDM is the entire process:
So, to answer your question, all the above steps are being done in https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/i2vgen_xl/pipeline_i2vgen_xl.py. Also, you can access the intermediate latents at the end of the iterative process by setting Hope that makes sense. |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
I was playing with I2VGenXLPipeline. Here is corresponding Huggingface implementation.. I saw some discrepancy between method described in the paper and this implementation. Can someone help me in checking if my understanding is correct.
In the paper, they have the following diagram:

According to this diagram, in the base stage, they have
D.Enc.
andG.Inc
, however, I only see CLIP in the implementation here.Similarly, in implementation, I observe that text embeddings are passed to the LDM of base stage (this line), however, as per the diagram, text is only passed in refinement stage.
In refinement stage, there is LDM, however, in implementation, I see that low dimensional video latent is passed to VAE decoder to generate high dimensional video, I do not see any reverse diffusion process.
Can anyone tell me if my understanding is correct for this code? I wanted to access intermediate low dimensional video, which comes at the end of base stage, but I don't know how to exactly access it. Can anyone tell me how to access that representation? I would appreciate it.
Beta Was this translation helpful? Give feedback.
All reactions