PAIR
/

Text-to-Video
Diffusers
StableDiffusionPipeline
text-to-image
Inference Endpoints
Edit model card

Text2Video-Zero Model Card - ControlNet Canny Avatar Style

Text2Video-Zero is a zero-shot text to video generator. It can perform zero-shot text-to-video generation, Video Instruct Pix2Pix (instruction-guided video editing), text and pose conditional video generation, text and canny-edge conditional video generation, and text, canny-edge and dreambooth conditional video generation. For more information about this work, please have a look at our paper and our demo: Hugging Face Spaces Our code works with any StableDiffusion base model.

This model provides DreamBooth weights for the Avatar style to be used with edge guidance (using ControlNet) in text2video zero.

Weights for Text2Video-Zero

We converted the original weights into diffusers and made them usable for ControlNet with edge guidance using: https://github.com/lllyasviel/ControlNet/discussions/12.

Model Details

  • Developed by: Levon Khachatryan, Andranik Movsisyan, Vahram Tadevosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan and Humphrey Shi

  • Model type: Dreambooth text-to-image and text-to-video generation model with edge control for text2video zero

  • Language(s): English

  • License: The CreativeML OpenRAIL M license.

  • Model Description: This is a model for text2video zero with edge guidance and avatar style. It can be used also with ControlNet in a text-to-image setup with edge guidance.

  • DreamBoth Keyword: avatar style

  • Resources for more information: GitHub, Paper, CIVITAI.

  • Cite as:

    @article{text2video-zero,
      title={Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators},
      author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
      journal={arXiv preprint arXiv:2303.13439},
      year={2023}
    }
    

Original Weights

The Dreambooth weights for the Avatar style were taken from CIVITAI.

Model Details

  • Developed by: Quiet_Joker (Username listed on CIVITAI)
  • Model type: Dreambooth text-to-image generation model
  • Language(s): English
  • License: The CreativeML OpenRAIL M license.
  • Model Description: This is a model that was created using DreamBooth to generate images with avatar style, based on text prompts.
  • DreamBoth Keyword: avatar style
  • Resources for more information: CIVITAI.

Biases content acknowledgement:

Beware that Text2Video-Zero may output content that reinforces or exacerbates societal biases, as well as realistic faces, pornography, and violence. Text2Video-Zero in this demo is meant only for research purposes.

Citation

  @article{text2video-zero,
    title={Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators},
    author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
    journal={arXiv preprint arXiv:2303.13439},
    year={2023}
  }
Downloads last month
33
Inference API
Inference API (serverless) does not yet support diffusers models for this pipeline type.

Spaces using PAIR/text2video-zero-controlnet-canny-avatar 3