Spaces:
Running
Phoneme_Hallucinator
This is the repository of the paper "Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion" accepted by AAAI-2024. Some audio samples are provided here.
Inference Tutorial
- If you only want to run our VC pipeline, please download
Phoneme Hallucinator DEMO.ipynb
in this repo and run it in google colab.
Training Tutorial
Prepare environment. Require
Python 3.6.3
and the following packagespillow == 8.0.1 torch == 1.10.2 tensorflow == 1.15.5 tensorflow-probability == 0.7.0 tensorpack == 0.9.8 h5py == 2.10.0 numpy == 1.19.5 pathlib == 1.0.1 tqdm == 4.64.1 easydict == 1.10 matplotlib == 3.3.4 scikit-learn == 0.24.2 scipy == 1.5.4 seaborn == 0.11.2
To prepare the training set, we need to use WavLM to extract speech representations. Go to kNN-VC repo and follow its instructions to extract speech representations. Namely, after placing LibriSpeech dataset in a correct location, run the command:
python prematch_dataset.py --librispeech_path /path/to/librispeech/root --out_path /path/where/you/want/outputs/to/go --topk 4 --matching_layer 6 --synthesis_layer 6
Note that we don't use the "--prematch" option, becuase we only need to extract representations, not to extract and then perform kNN regression.
After the above step, you can get a
--out_path
folder with three subfolderstrain-clean-100
,test-clean
anddev-clean
where each folder contains the speech representation files (".pt").Go to our repo
./dataset/speech.py
and change the variablespath_to_wavlm_feat
andtfrecord_path
accordingly. You need to changepath_to_wavlm_feat
to where the speech representations are stored in the previous step.Start Training by the following command:
python scripts/run.py --cfg_file=./exp/speech_XXL_cond/params.json --mode=train
If
tfrecord_path
doesn't exist, our codes will create tfrecords and save them totfrecord_path
before training starts. Note that if you encounter numerical issues ("NaN, INF") when the training starts, just try re-run the command multiple times. Training los will be saved to./exp/speech_XXL_cond/
.