voxxer
/

speecht5_finetuned_commonvoice_ru_translit

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

speecht5_finetuned_commonvoice_ru_translit / README.md

voxxer's picture

Update README.md

3af7ab8 about 1 year ago

|

history blame contribute delete

3.67 kB

	---
	language: ru
	license: mit
	base_model: microsoft/speecht5_tts
	task: text-to-speech
	tags:
	- generated_from_trainer
	- audio
	- text-to-speech
	datasets:
	- mozilla-foundation/common_voice_13_0
	model-index:
	- name: SpeechT5 - Russian translit
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# SpeechT5 - Russian translit

	This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the Common Voice 13 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4853

	## Model description

	Input should be a russian text in transliterated form (use transliterate package).
	This is just a test for the hands-on excercise of HF Audio Course! Not intended for actual use!

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 400
	- training_steps: 2000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.0359 \| 0.6 \| 50 \| 0.8176 \|
	\| 0.8866 \| 1.19 \| 100 \| 0.6899 \|
	\| 0.787 \| 1.79 \| 150 \| 0.6478 \|
	\| 0.7477 \| 2.38 \| 200 \| 0.6233 \|
	\| 0.6734 \| 2.98 \| 250 \| 0.5630 \|
	\| 0.6216 \| 3.58 \| 300 \| 0.5429 \|
	\| 0.593 \| 4.17 \| 350 \| 0.5304 \|
	\| 0.5817 \| 4.77 \| 400 \| 0.5282 \|
	\| 0.5734 \| 5.37 \| 450 \| 0.5167 \|
	\| 0.5688 \| 5.96 \| 500 \| 0.5209 \|
	\| 0.5662 \| 6.56 \| 550 \| 0.5095 \|
	\| 0.5609 \| 7.15 \| 600 \| 0.5127 \|
	\| 0.554 \| 7.75 \| 650 \| 0.5041 \|
	\| 0.5522 \| 8.35 \| 700 \| 0.5038 \|
	\| 0.5372 \| 8.94 \| 750 \| 0.4984 \|
	\| 0.5432 \| 9.54 \| 800 \| 0.4995 \|
	\| 0.5384 \| 10.13 \| 850 \| 0.4971 \|
	\| 0.5345 \| 10.73 \| 900 \| 0.4981 \|
	\| 0.5358 \| 11.33 \| 950 \| 0.4942 \|
	\| 0.5332 \| 11.92 \| 1000 \| 0.4906 \|
	\| 0.5334 \| 12.52 \| 1050 \| 0.4897 \|
	\| 0.5301 \| 13.11 \| 1100 \| 0.4914 \|
	\| 0.5298 \| 13.71 \| 1150 \| 0.4894 \|
	\| 0.524 \| 14.31 \| 1200 \| 0.4871 \|
	\| 0.5221 \| 14.9 \| 1250 \| 0.4884 \|
	\| 0.525 \| 15.5 \| 1300 \| 0.4883 \|
	\| 0.5232 \| 16.1 \| 1350 \| 0.4866 \|
	\| 0.5261 \| 16.69 \| 1400 \| 0.4858 \|
	\| 0.521 \| 17.29 \| 1450 \| 0.4852 \|
	\| 0.5225 \| 17.88 \| 1500 \| 0.4849 \|
	\| 0.5219 \| 18.48 \| 1550 \| 0.4860 \|
	\| 0.5207 \| 19.08 \| 1600 \| 0.4839 \|
	\| 0.5192 \| 19.67 \| 1650 \| 0.4851 \|
	\| 0.516 \| 20.27 \| 1700 \| 0.4860 \|
	\| 0.5186 \| 20.86 \| 1750 \| 0.4811 \|
	\| 0.5233 \| 21.46 \| 1800 \| 0.4841 \|
	\| 0.5145 \| 22.06 \| 1850 \| 0.4819 \|
	\| 0.5159 \| 22.65 \| 1900 \| 0.4822 \|
	\| 0.5146 \| 23.25 \| 1950 \| 0.4831 \|
	\| 0.5175 \| 23.85 \| 2000 \| 0.4853 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3