You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work. I removed the classification head and trying to use this repo for image generation but I get really bad results. All images have patchy looks and very low quality. I played with number of heads, number of layers, LR etc, but didnt really matter.
What would be the most sensible approach to generate images with the encoder part?
The text was updated successfully, but these errors were encountered:
@basamelatex no one has shown that this can work with a straight encoder yet afaik, but people have discretized the pixel space and then used a decoder to generate the image as with iGPT and Image Transformer
Thanks a lot for your answer, I checked out the papers you mentioned above. I noticed that they were able to generate only quite small images such as 64x64 and used relatively small datasets like CIFAR10. On the other hand, in the Vit paper they were suggesting that the model doesn't work well on small datasets. Do you think this would be the case in image generation as well? Do we really need a huge dataset for Vit to work on image generation? I would like to give it a try, but I feel a bit skeptical after seeing 300M dataset they use..
Thanks for the great work. I removed the classification head and trying to use this repo for image generation but I get really bad results. All images have patchy looks and very low quality. I played with number of heads, number of layers, LR etc, but didnt really matter.
What would be the most sensible approach to generate images with the encoder part?
The text was updated successfully, but these errors were encountered: