Twitter | Search | |
Aäron van den Oord
VQVAE-2 finally out! Powerful autoregressive models in a hierarchical compressed latent space. No modes were collapsed in the creation of these samples ;) Arixv: With and More samples and details 👇 [thread]
Reply Retweet Like More
Aäron van den Oord 4 Jun 19
Replying to @avdnoord
We use a hierarchical VQVAE which compresses images into a latent space which is about 50x smaller for ImageNet and 200x smaller for FFHQ Faces. The PixelCNN only models the latents, allowing it to spend its capacity on the global structure and most perceivable features.
Reply Retweet Like
Aäron van den Oord 4 Jun 19
Replying to @avdnoord
Samples from our 256px two-stage ImageNet VQVAE
Reply Retweet Like
Aäron van den Oord 4 Jun 19
Replying to @avdnoord
We found the diversity of these samples to be much higher than competing adversarial methods.
Reply Retweet Like
Aäron van den Oord 4 Jun 19
Replying to @avdnoord
For megapixel faces (1024x1024) we use a three-stage VQVAE model.
Reply Retweet Like
Aäron van den Oord 4 Jun 19
Replying to @avdnoord
More samples and full high-res uncompressed images can be found here [70M]:
Reply Retweet Like
Arash Vahdat 4 Jun 19
Impressive!! How come log-likelihood is reported on the prior, not on the data?
Reply Retweet Like
David Berthelot 4 Jun 19
Looks good, do you have interpolation between images to show off how well the latent space generalizes? Like the volcanoes in BigGAN or the faces in StyleGAN.
Reply Retweet Like
Dave Harris, but masked 4 Jun 19
Is it straightforward to do that with an autoregressive model? It seems like it should be possible, but I’ve never thought about it before.
Reply Retweet Like
𝖁𝕰𝕼𝕿𝕺𝕽 4 Jun 19
Please release code! 😍
Reply Retweet Like
Thomas Pinetz 5 Jun 19
On how many GPU's for how long was this trained for the imagenet 256x256 images?
Reply Retweet Like