rain1011 feifeiobama commited on
Commit
434625d
1 Parent(s): ddc647e

Update README.md (#12)

Browse files

- Update README.md (8fe3939412dd82c08c642b04d826f51d87e2620b)


Co-authored-by: Zhicheng Sun <feifeiobama@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +25 -6
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
 
12
  # ⚡️Pyramid Flow⚡️
13
 
14
- [[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow)
15
 
16
  This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
17
 
@@ -31,11 +31,24 @@ This is the official repository for Pyramid Flow, a training-efficient **Autoreg
31
  ## News
32
 
33
  * `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
 
34
  * `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
35
 
36
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- You can directly download the model from [Huggingface](https://huggingface.co/rain1011/pyramid-flow-sd3). We provide both model checkpoints for 768p and 384p video generation. The 384p checkpoint supports 5-second video generation at 24FPS, while the 768p checkpoint supports up to 10-second video generation at 24FPS.
39
 
40
  ```python
41
  from huggingface_hub import snapshot_download
@@ -44,6 +57,8 @@ model_path = 'PATH' # The local directory to save downloaded checkpoint
44
  snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
45
  ```
46
 
 
 
47
  To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
48
 
49
  ```python
@@ -53,7 +68,7 @@ from pyramid_dit import PyramidDiTForVideoGeneration
53
  from diffusers.utils import load_image, export_to_video
54
 
55
  torch.cuda.set_device(0)
56
- model_dtype, torch_dtype = 'bf16', torch.bfloat16 # Use bf16, fp16 or fp32
57
 
58
  model = PyramidDiTForVideoGeneration(
59
  'PATH', # The downloaded checkpoint dir
@@ -80,9 +95,10 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
80
  height=768,
81
  width=1280,
82
  temp=16, # temp=16: 5s, temp=31: 10s
83
- guidance_scale=9.0, # The guidance for the first frame
84
  video_guidance_scale=5.0, # The guidance for the other video latent
85
  output_type="pil",
 
86
  )
87
 
88
  export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
@@ -102,12 +118,15 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
102
  temp=16,
103
  video_guidance_scale=4.0,
104
  output_type="pil",
 
105
  )
106
 
107
  export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
108
  ```
109
 
110
- Usage tips:
 
 
111
 
112
  * The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
113
  * The `video_guidance_scale` parameter controls the motion. A larger value increases the dynamic degree and mitigates the autoregressive generation degradation, while a smaller value stabilizes the video.
 
11
 
12
  # ⚡️Pyramid Flow⚡️
13
 
14
+ [[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
15
 
16
  This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
17
 
 
31
  ## News
32
 
33
  * `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
34
+ * `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
35
  * `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
36
 
37
+ ## Installation
38
+
39
+ We recommend setting up the environment with conda. The codebase currently uses Python 3.8.10 and PyTorch 2.1.2, and we are actively working to support a wider range of versions.
40
+
41
+ ```bash
42
+ git clone https://github.com/jy0205/Pyramid-Flow
43
+ cd Pyramid-Flow
44
+
45
+ # create env using conda
46
+ conda create -n pyramid python==3.8.10
47
+ conda activate pyramid
48
+ pip install -r requirements.txt
49
+ ```
50
 
51
+ Then, you can directly download the model from [Huggingface](https://huggingface.co/rain1011/pyramid-flow-sd3). We provide both model checkpoints for 768p and 384p video generation. The 384p checkpoint supports 5-second video generation at 24FPS, while the 768p checkpoint supports up to 10-second video generation at 24FPS.
52
 
53
  ```python
54
  from huggingface_hub import snapshot_download
 
57
  snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
58
  ```
59
 
60
+ ## Usage
61
+
62
  To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
63
 
64
  ```python
 
68
  from diffusers.utils import load_image, export_to_video
69
 
70
  torch.cuda.set_device(0)
71
+ model_dtype, torch_dtype = 'bf16', torch.bfloat16 # Use bf16 (not support fp16 yet)
72
 
73
  model = PyramidDiTForVideoGeneration(
74
  'PATH', # The downloaded checkpoint dir
 
95
  height=768,
96
  width=1280,
97
  temp=16, # temp=16: 5s, temp=31: 10s
98
+ guidance_scale=9.0, # The guidance for the first frame, set it to 7 for 384p variant
99
  video_guidance_scale=5.0, # The guidance for the other video latent
100
  output_type="pil",
101
+ save_memory=True, # If you have enough GPU memory, set it to `False` to improve vae decoding speed
102
  )
103
 
104
  export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
 
118
  temp=16,
119
  video_guidance_scale=4.0,
120
  output_type="pil",
121
+ save_memory=True, # If you have enough GPU memory, set it to `False` to improve vae decoding speed
122
  )
123
 
124
  export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
125
  ```
126
 
127
+ We also support CPU offloading to allow inference with **less than 12GB** of GPU memory by adding a `cpu_offloading=True` parameter. This feature was contributed by [@Ednaordinary](https://github.com/Ednaordinary), see [#23](https://github.com/jy0205/Pyramid-Flow/pull/23) for details.
128
+
129
+ ## Usage tips
130
 
131
  * The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
132
  * The `video_guidance_scale` parameter controls the motion. A larger value increases the dynamic degree and mitigates the autoregressive generation degradation, while a smaller value stabilizes the video.