Commit
•
434625d
1
Parent(s):
ddc647e
Update README.md (#12)
Browse files- Update README.md (8fe3939412dd82c08c642b04d826f51d87e2620b)
Co-authored-by: Zhicheng Sun <feifeiobama@users.noreply.huggingface.co>
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
|
12 |
# ⚡️Pyramid Flow⚡️
|
13 |
|
14 |
-
[[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow)
|
15 |
|
16 |
This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
|
17 |
|
@@ -31,11 +31,24 @@ This is the official repository for Pyramid Flow, a training-efficient **Autoreg
|
|
31 |
## News
|
32 |
|
33 |
* `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
|
|
|
34 |
* `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
|
35 |
|
36 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
-
|
39 |
|
40 |
```python
|
41 |
from huggingface_hub import snapshot_download
|
@@ -44,6 +57,8 @@ model_path = 'PATH' # The local directory to save downloaded checkpoint
|
|
44 |
snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
|
45 |
```
|
46 |
|
|
|
|
|
47 |
To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
|
48 |
|
49 |
```python
|
@@ -53,7 +68,7 @@ from pyramid_dit import PyramidDiTForVideoGeneration
|
|
53 |
from diffusers.utils import load_image, export_to_video
|
54 |
|
55 |
torch.cuda.set_device(0)
|
56 |
-
model_dtype, torch_dtype = 'bf16', torch.bfloat16 # Use bf16
|
57 |
|
58 |
model = PyramidDiTForVideoGeneration(
|
59 |
'PATH', # The downloaded checkpoint dir
|
@@ -80,9 +95,10 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
|
|
80 |
height=768,
|
81 |
width=1280,
|
82 |
temp=16, # temp=16: 5s, temp=31: 10s
|
83 |
-
guidance_scale=9.0, # The guidance for the first frame
|
84 |
video_guidance_scale=5.0, # The guidance for the other video latent
|
85 |
output_type="pil",
|
|
|
86 |
)
|
87 |
|
88 |
export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
|
@@ -102,12 +118,15 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
|
|
102 |
temp=16,
|
103 |
video_guidance_scale=4.0,
|
104 |
output_type="pil",
|
|
|
105 |
)
|
106 |
|
107 |
export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
|
108 |
```
|
109 |
|
110 |
-
|
|
|
|
|
111 |
|
112 |
* The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
|
113 |
* The `video_guidance_scale` parameter controls the motion. A larger value increases the dynamic degree and mitigates the autoregressive generation degradation, while a smaller value stabilizes the video.
|
|
|
11 |
|
12 |
# ⚡️Pyramid Flow⚡️
|
13 |
|
14 |
+
[[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
|
15 |
|
16 |
This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
|
17 |
|
|
|
31 |
## News
|
32 |
|
33 |
* `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
|
34 |
+
* `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
|
35 |
* `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
|
36 |
|
37 |
+
## Installation
|
38 |
+
|
39 |
+
We recommend setting up the environment with conda. The codebase currently uses Python 3.8.10 and PyTorch 2.1.2, and we are actively working to support a wider range of versions.
|
40 |
+
|
41 |
+
```bash
|
42 |
+
git clone https://github.com/jy0205/Pyramid-Flow
|
43 |
+
cd Pyramid-Flow
|
44 |
+
|
45 |
+
# create env using conda
|
46 |
+
conda create -n pyramid python==3.8.10
|
47 |
+
conda activate pyramid
|
48 |
+
pip install -r requirements.txt
|
49 |
+
```
|
50 |
|
51 |
+
Then, you can directly download the model from [Huggingface](https://huggingface.co/rain1011/pyramid-flow-sd3). We provide both model checkpoints for 768p and 384p video generation. The 384p checkpoint supports 5-second video generation at 24FPS, while the 768p checkpoint supports up to 10-second video generation at 24FPS.
|
52 |
|
53 |
```python
|
54 |
from huggingface_hub import snapshot_download
|
|
|
57 |
snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
|
58 |
```
|
59 |
|
60 |
+
## Usage
|
61 |
+
|
62 |
To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
|
63 |
|
64 |
```python
|
|
|
68 |
from diffusers.utils import load_image, export_to_video
|
69 |
|
70 |
torch.cuda.set_device(0)
|
71 |
+
model_dtype, torch_dtype = 'bf16', torch.bfloat16 # Use bf16 (not support fp16 yet)
|
72 |
|
73 |
model = PyramidDiTForVideoGeneration(
|
74 |
'PATH', # The downloaded checkpoint dir
|
|
|
95 |
height=768,
|
96 |
width=1280,
|
97 |
temp=16, # temp=16: 5s, temp=31: 10s
|
98 |
+
guidance_scale=9.0, # The guidance for the first frame, set it to 7 for 384p variant
|
99 |
video_guidance_scale=5.0, # The guidance for the other video latent
|
100 |
output_type="pil",
|
101 |
+
save_memory=True, # If you have enough GPU memory, set it to `False` to improve vae decoding speed
|
102 |
)
|
103 |
|
104 |
export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
|
|
|
118 |
temp=16,
|
119 |
video_guidance_scale=4.0,
|
120 |
output_type="pil",
|
121 |
+
save_memory=True, # If you have enough GPU memory, set it to `False` to improve vae decoding speed
|
122 |
)
|
123 |
|
124 |
export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
|
125 |
```
|
126 |
|
127 |
+
We also support CPU offloading to allow inference with **less than 12GB** of GPU memory by adding a `cpu_offloading=True` parameter. This feature was contributed by [@Ednaordinary](https://github.com/Ednaordinary), see [#23](https://github.com/jy0205/Pyramid-Flow/pull/23) for details.
|
128 |
+
|
129 |
+
## Usage tips
|
130 |
|
131 |
* The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
|
132 |
* The `video_guidance_scale` parameter controls the motion. A larger value increases the dynamic degree and mitigates the autoregressive generation degradation, while a smaller value stabilizes the video.
|