DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
News
📢 9/May/23 - First release - arxiv, code and pre-trained models.
1. Getting started
This code was tested on NVIDIA GeForce RTX 2080 Ti
and requires:
- conda3 or miniconda3
conda create -n DiffuseStyleGesture python=3.7
pip install -r requirements.txt
2. Quick Start
- Download pre-trained model from Tsinghua Cloud or Google Cloud
and put it into
./main/mydiffusion_zeggs/
. - Download the WavLM Large and put it into
./main/mydiffusion_zeggs/WavLM/
. - cd
./main/mydiffusion_zeggs/
and run
python sample.py --config=./configs/DiffuseStyleGesture.yml --no_cuda 0 --gpu 0 --model_path './model000450000.pt' --audiowavlm_path "./015_Happy_4_x_1_0.wav" --max_len 320
You will get the .bvh
file named yyyymmdd_hhmmss_smoothing_SG_minibatch_320_[1, 0, 0, 0, 0, 0]_123456.bvh
in the sample_dir
folder, which can then be visualized using Blender.
3. Train your own model
(1) Get ZEGGS dataset
Same as ZEGGS.
An example is as follows.
Download original ZEGGS datasets from here and put it in ./ubisoft-laforge-ZeroEGGS-main/data/
folder.
Then cd ./ubisoft-laforge-ZeroEGGS-main/ZEGGS
and run python data_pipeline.py
to process the dataset.
You will get ./ubisoft-laforge-ZeroEGGS-main/data/processed_v1/trimmed/train/
and ./ubisoft-laforge-ZeroEGGS-main/data/processed_v1/trimmed/test/
folders.
If you find it difficult to obtain and process the data, you can download the data after it has been processed by ZEGGS from Tsinghua Cloud or Baidu Cloud.
And put it in ./ubisoft-laforge-ZeroEGGS-main/data/processed_v1/trimmed/
folder.
(2) Process ZEGGS dataset
cd ./main/mydiffusion_zeggs/
python zeggs_data_to_lmdb.py
(3) Train
python end2end.py --config=./configs/DiffuseStyleGesture.yml --no_cuda 0 --gpu 0
The model will save in ./main/mydiffusion_zeggs/zeggs_mymodel3_wavlm/
folder.
Reference
Our work mainly inspired by: MDM, Text2Gesture, Listen, denoise, action!
Citation
If you find this code useful in your research, please cite:
@inproceedings{yang2023DiffuseStyleGesture,
author = {Sicheng Yang and Zhiyong Wu and Minglei Li and Zhensong Zhang and Lei Hao and Weihong Bao and Ming Cheng and Long Xiao},
title = {DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models},
booktitle = {Proceedings of the 32nd International Joint Conference on Artificial Intelligence, {IJCAI} 2023},
publisher = {ijcai.org},
year = {2023},
}
Please feel free to contact us (yangsc21@mails.tsinghua.edu.cn) with any question or concerns.