Spaces:
Paused
Paused
title: Xora_I2V # Replace with your app's title | |
emoji: 🚀 # Choose an emoji to represent your app | |
colorFrom: blue # Choose a color to start the gradient (e.g., blue, red, green) | |
colorTo: purple # Choose a color to end the gradient | |
sdk: gradio # Specify the SDK, e.g., gradio or streamlit | |
sdk_version: "5.5.0" # Specify the SDK version if needed | |
app_file: app.py # Name of your main app file | |
pinned: false # Set to true if you want to pin this Space | |
<div align="center"> | |
# Xora️ | |
</div> | |
This is the official repository for Xora. | |
## Table of Contents | |
* [Introduction](#introduction) | |
* [Installation](#installation) | |
* [Inference](#inference) | |
* [Inference Code](#inference-code) | |
* [Acknowledgement](#acknowledgement) | |
## Introduction | |
The performance of Diffusion Transformers is heavily influenced by the number of generated latent pixels (or tokens). In video generation, the token count becomes substantial as the number of frames increases. To address this, we designed a carefully optimized VAE that compresses videos into a smaller number of tokens while utilizing a deeper latent space. This approach enables our model to generate high-quality 768x512 videos at 24 FPS, achieving near real-time speeds. | |
## Installation | |
# Setup | |
The codebase currently uses Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2. | |
```bash | |
git clone https://github.com/LightricksResearch/xora-core.git | |
cd xora-core | |
# create env | |
python -m venv env | |
source env/bin/activate | |
python -m pip install -e .\[inference-script\] | |
``` | |
Then, download the model from [Hugging Face](https://huggingface.co/Lightricks/Xora) | |
```python | |
from huggingface_hub import snapshot_download | |
model_path = 'PATH' # The local directory to save downloaded checkpoint | |
snapshot_download("Lightricks/Xora", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model') | |
``` | |
## Inference | |
### Inference Code | |
To use our model, please follow the inference code in `inference.py` at [https://github.com/LightricksResearch/xora-core/blob/main/inference.py](): | |
For text-to-video generation: | |
```bash | |
python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH | |
``` | |
For image-to-video generation: | |
```python | |
python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH | |
``` | |
## Acknowledgement | |
We are grateful for the following awesome projects when implementing Xora: | |
* [DiT](https://github.com/facebookresearch/DiT) and [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha): vision transformers for image generation. | |
[//]: # (## Citation) | |