metadata
license: apache-2.0
tags:
- Kandinsky
- text-image
- text2image
- diffusion
- latent diffusion
- mCLIP-XLMR
- mT5
Kandinsky 2.0
Kandinsky 2.0 — the first multilingual text2image model.
UNet size: 1.2B parameters
It is a latent diffusion model with two multi-lingual text encoders:
- mCLIP-XLMR (560M parameters)
- mT5-encoder-small (146M parameters)
These encoders and multilingual training datasets unveil the real multilingual text2image generation experience!