Positional embedding for time in temporal attention layers in SVD #6541

jeanne-wang · 2024-01-12T02:32:59Z

jeanne-wang
Jan 12, 2024

It seems that SVD are not actually using positional embedding for time in the temporal attention layers in the released model. Are there any specific reason not doing this?

patrickvonplaten · 2024-01-15T14:17:58Z

patrickvonplaten
Jan 15, 2024

Hey @jeanne-wang,

I'm not 100% sure what you mean by "positional embedding for time", but the difference between SVD and say SDXL is that SVD conditions the unet not on discrete timesteps (e.g. 1, 2, ..., 999), but on continous "noise" values such as (3.2341).
This is explained in the paper in section 4.1:

As a first step, we finetune the
fixed discrete noise schedule from our image model towards
continuous noise [87] using the network preconditioning
proposed in Karras et al. [51] for images of size 256 × 384.

You should also probably take a look at https://arxiv.org/abs/2206.00364

1 reply

lim142857 Feb 29, 2024

Hey @patrickvonplaten Is it possible to perform DDIM inversion on real video using SVD? I am concerned that DDIM is generally applied with discrete-time schedulers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positional embedding for time in temporal attention layers in SVD #6541

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Positional embedding for time in temporal attention layers in SVD #6541

jeanne-wang Jan 12, 2024

Replies: 1 comment · 1 reply

patrickvonplaten Jan 15, 2024

lim142857 Feb 29, 2024

jeanne-wang
Jan 12, 2024

Replies: 1 comment 1 reply

patrickvonplaten
Jan 15, 2024