UTDUSS-Vocoder / README.md
Wataru's picture
Update README.md
8d6d3af verified
|
raw
history blame
1.89 kB
metadata
license: cc-by-nc-4.0

UTDUSS vocoder model

In this repo, we provide model weight of the descript audio codec used for the Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge

Prerequesties

official dac library which can be installed with the following command.

pip install descript-audio-codec

Provided weights

Vocoder task

model name on paper model name on this repo
πŸ˜€ expresso_16k_2code.pth
πŸ˜€ w/o hyper-parameter tuning expresso_16k_2code_official.pth
πŸ˜€ w/o data exclusion expresso_16k_2code_wo_data.pth
πŸ˜€ w/o matching sampling rate expresso_24k_2code_ab.pth

Acoustic +Vocoder (TTS) task

Please note that the weight for acoustic model is not provided.

Full training set

model name on paper model name on this repo
Discrete-TTS v1, v1.1 lj_16k_1code.pth
Discrete-TTS v2, v2.2 lj_16k_1code_512.pth
Discrete-TTS v3 lj_16k_1code_256.pth

1h training set

model name on paper model name on this repo
Discrete-TTS v1, v1.1 lj_1h_16k_1code.pth
Discrete-TTS v2, v2.2 lj_1h_16k_1code_512.pth
Discrete-TTS v3 lj_1h_16k_1code_256.pth

Sample code

import dac
import torch
from pathlib import Path
model_url = "https://huggingface.co/sarulab-speech/UTDUSS-Vocoder/resolve/main/expresso_16k_2code.pth"
model_path = Path(f"/tmp/utduss/{model_url.split('/')[-1]}")
model_path.parent.mkdir(parents=True,exist_ok=True)
torch.hub.download_url_to_file(model_url,model_path)
model = dac.DAC.load(model_path)

Contributors