---
title: README
emoji: 🏃
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
---
Welcome to WARP. This is our little organization for multimodal generative models, focusing on the visual domain. We have been working with generative image models a lot and
will soon work on video models as well. Our main team consists of:
- [Pablo Pernias](https://github.com/pabloppp/)
- [Dominic Rampas](https://github.com/dome272)
- [Marc Aubreville](https://www.linkedin.com/in/marc-aubreville-48a977120/?locale=en_US)
- [Mats L. Richter](https://scholar.google.com/citations?user=xtlV5SAAAAAJ&hl=de)
A special thanks to the Huggingface Team for helping to bring our research to Diffusers! (Special thanks to [Kashif](https://github.com/kashif/), [Patrick](https://github.com/patrickvonplaten) and [Sayak](https://github.com/sayakpaul)!)
Feel free to join our [Discord](https://discord.gg/BTUAzb8vFY) channel!
Models:
Paella
- A simple & straightforward text-conditional image generation model that works on quantized latents.
- More details can be found in the paper, the blog post and the YouTube video.
- Only accessible through GitHub.
Würstchen
- An efficient text-to-image model to train and use for inference. Achieves competetive performance to state-of-the-art methods, while needing only a fraction of the compute.
- More details can be found in the paper.
- Versions: