metadata
license: cc-by-nc-4.0
UTDUSS vocoder model
In this repo, we provide model weight of the descript audio codec used for the Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Prerequesties
official dac library which can be installed with the following command.
pip install descript-audio-codec
Provided weights
Vocoder task
model name on paper | model name on this repo |
---|---|
π | expresso_16k_2code.pth |
π w/o hyper-parameter tuning | expresso_16k_2code_official.pth |
π w/o data exclusion | expresso_16k_2code_wo_data.pth |
π w/o matching sampling rate | expresso_24k_2code_ab.pth |
Acoustic +Vocoder (TTS) task
Please note that the weight for acoustic model is not provided.
Full training set
model name on paper | model name on this repo |
---|---|
Discrete-TTS v1, v1.1 | lj_16k_1code.pth |
Discrete-TTS v2, v2.2 | lj_16k_1code_512.pth |
Discrete-TTS v3 | lj_16k_1code_256.pth |
1h training set
model name on paper | model name on this repo |
---|---|
Discrete-TTS v1, v1.1 | lj_1h_16k_1code.pth |
Discrete-TTS v2, v2.2 | lj_1h_16k_1code_512.pth |
Discrete-TTS v3 | lj_1h_16k_1code_256.pth |
Sample code
import dac
import torch
from pathlib import Path
model_url = "https://huggingface.co/sarulab-speech/UTDUSS-Vocoder/resolve/main/expresso_16k_2code.pth"
model_path = Path(f"/tmp/utduss/{model_url.split('/')[-1]}")
model_path.parent.mkdir(parents=True,exist_ok=True)
torch.hub.download_url_to_file(model_url,model_path)
model = dac.DAC.load(model_path)
Contributors
- Wataru Nakata
- Kazuki Yamauchi
- Dong Yang
- Hiroaki Hyodo
- Yuki Saito