Commands
Inference
You can modify corresponding config files to change the inference settings. See more details here.
Inference with DiT pretrained on ImageNet
The following command automatically downloads the pretrained weights on ImageNet and runs inference.
python scripts/inference.py configs/dit/inference/1x256x256-class.py --ckpt-path DiT-XL-2-256x256.pt
Inference with Latte pretrained on UCF101
The following command automatically downloads the pretrained weights on UCF101 and runs inference.
python scripts/inference.py configs/latte/inference/16x256x256-class.py --ckpt-path Latte-XL-2-256x256-ucf101.pt
Inference with PixArt-α pretrained weights
Download T5 into ./pretrained_models
and run the following command.
# 256x256
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x256x256.py --ckpt-path PixArt-XL-2-256x256.pth
# 512x512
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x512x512.py --ckpt-path PixArt-XL-2-512x512.pth
# 1024 multi-scale
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x1024MS.py --ckpt-path PixArt-XL-2-1024MS.pth
Inference with checkpoints saved during training
During training, an experiment logging folder is created in outputs
directory. Under each checpoint folder, e.g. epoch12-global_step2000
, there is a ema.pt
and the shared model
folder. Run the following command to perform inference.
# inference with ema model
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000/ema.pt
# inference with model
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000
# inference with sequence parallelism
# sequence parallelism is enabled automatically when nproc_per_node is larger than 1
torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000
The second command will automatically generate a model_ckpt.pt
file in the checkpoint folder.
Inference Hyperparameters
- DPM-solver is good at fast inference for images. However, the video result is not satisfactory. You can use it for fast demo purpose.
type="dmp-solver"
num_sampling_steps=20
- You can use SVD's finetuned VAE decoder on videos for inference (consumes more memory). However, we do not see significant improvement in the video result. To use it, download the pretrained weights into
./pretrained_models/vae_temporal_decoder
and modify the config file as follows.
vae = dict(
type="VideoAutoencoderKLTemporalDecoder",
from_pretrained="pretrained_models/vae_temporal_decoder",
)
## Training
To resume training, run the following command. ``--load`` different from ``--ckpt-path`` as it loads the optimizer and dataloader states.
```bash
torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --load YOUR_PRETRAINED_CKPT
To enable wandb logging, add --wandb
to the command.
WANDB_API_KEY=YOUR_WANDB_API_KEY torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --wandb True
You can modify corresponding config files to change the training settings. See more details here.
Training Hyperparameters
dtype
is the data type for training. Onlyfp16
andbf16
are supported. ColossalAI automatically enables the mixed precision training forfp16
andbf16
. During training, we findbf16
more stable.