Matryoshka Diffusion Models

Matryoshka Diffusion Models was introduced in the paper of the same name, by Jiatao Gu,Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly.

This repository contains the Flickr 256 checkpoint.

Generation Examples from the MDM repository

Highlights

  • This checkpoint was trained on a dataset of 50M text-image pairs collected from Flickr.
  • This model was trained using nested UNets at various resolutions, and generates images with a resolution of 256 ร— 256.
  • Despite training on relatively small datasets, MDMs show strong zero-shot capabilities of generating high-resolution images and videos.

Checkpoints

Model Dataset Resolution Nested UNets
mdm-flickr-64 Flickr 50M 64 ร— 64 โŽ
mdm-flickr-256 Flickr 50M 256 ร— 256 โœ…
mdm-flickr-1024 Flickr 50M 1024 ร— 1024 โœ…

How to Use

Please, refer to the original repository for training and inference instructions.

Citation

@misc{gu2023matryoshkadiffusionmodels,
      title={Matryoshka Diffusion Models},
      author={Jiatao Gu and Shuangfei Zhai and Yizhe Zhang and Josh Susskind and Navdeep Jaitly},
      year={2023},
      eprint={2310.15111},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2310.15111},
}
Downloads last month
20
Inference API
Unable to determine this model's library. Check the docs .

Space using pcuenq/mdm-flickr-256 1