--- license: cc-by-nc-sa-4.0 extra_gated_prompt: >- The Models are available for download for non-commercial purposes . Terms of Access: The researcher has requested permission to use the models. In exchange for such permission, the researcher hereby agrees to the following terms and conditions: 1. Researcher shall use the models only for non-commercial research and educational purposes. 2. The authors make no representations or warranties regarding the models, including but not limited to warranties of non-infringement or fitness for a particular purpose. 3. Researcher accepts full responsibility for his or her use of the models and shall defend and indemnify the authors of the models, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the models, including but not limited to Researcher's use of any copies of copyrighted models files that he or she may create from the models. 4.Researcher may provide research associates and colleagues with access to the models provided that they first agree to be bound by these terms and conditions. 5. The authors reserve the right to terminate Researcher's access to the models at any time. 6. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer. extra_gated_fields: Name: text Email: text Organization: text Address: text I accept the terms of access: checkbox datasets: - Wenetspeech4TTS/WenetSpeech4TTS language: - zh --- # ISCSLP2024 Conversational Voice Clone Challenge(CoVoC) baseline model. There are two baseline models in this competition. ## 1: The first model is VALL-E,training based on the open source code [Amphion](https://github.com/open-mmlab/Amphion). First, training is performed on the Wenetspeech4TTS dataset, and the model weight is valle_base_model.bin. After that, fine-tuning is performed on the HQ-Conversations dataset, the model weight is valle_HQ-sft_model.bin. For specific inference code, please refer to [ISCSLP2024_CoVoC_baseline Github](https://github.com/xkx-hub/ISCSLP2024_CoVoC_baseline) for more info. ## 2: The second model is fine-tuned on the open source model of fish-speech,the LLAMA and vits_decoder were fine-tuned. The training follows the default configuration of fish-speech. For specific training code, please refer to [Fish Speech Github](https://github.com/fishaudio/fish-speech) for more info.