xlsr_mid1_ja-ko / README.md

Upload folder using huggingface_hub

8b8b1c8 about 1 year ago

4.3 kB

	---
	license: apache-2.0
	base_model: facebook/wav2vec2-large-xlsr-53
	tags:
	- automatic-speech-recognition
	- ./sample_speech.py
	- generated_from_trainer
	model-index:
	- name: ja-xlsr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ja-xlsr

	This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the ./SAMPLE_SPEECH.PY - NA dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.5952
	- Cer: 0.3240

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- num_epochs: 300

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Cer \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:------:\|
	\| 4.9138 \| 6.52 \| 150 \| 4.7965 \| 1.0 \|
	\| 4.7484 \| 13.04 \| 300 \| 4.6081 \| 1.0 \|
	\| 4.5894 \| 19.57 \| 450 \| 4.4697 \| 0.9851 \|
	\| 4.2024 \| 26.09 \| 600 \| 4.0373 \| 0.9077 \|
	\| 2.7314 \| 32.61 \| 750 \| 2.5507 \| 0.5341 \|
	\| 1.2293 \| 39.13 \| 900 \| 2.0146 \| 0.4139 \|
	\| 0.5544 \| 45.65 \| 1050 \| 1.9821 \| 0.3556 \|
	\| 0.3224 \| 52.17 \| 1200 \| 2.0190 \| 0.3587 \|
	\| 0.1951 \| 58.7 \| 1350 \| 2.1229 \| 0.3612 \|
	\| 0.1539 \| 65.22 \| 1500 \| 2.1114 \| 0.3470 \|
	\| 0.1165 \| 71.74 \| 1650 \| 2.2748 \| 0.3315 \|
	\| 0.1119 \| 78.26 \| 1800 \| 2.2391 \| 0.3488 \|
	\| 0.0989 \| 84.78 \| 1950 \| 2.3438 \| 0.3383 \|
	\| 0.0915 \| 91.3 \| 2100 \| 2.1218 \| 0.3587 \|
	\| 0.0721 \| 97.83 \| 2250 \| 2.2428 \| 0.3519 \|
	\| 0.0742 \| 104.35 \| 2400 \| 2.2293 \| 0.3364 \|
	\| 0.0629 \| 110.87 \| 2550 \| 2.2878 \| 0.3371 \|
	\| 0.0495 \| 117.39 \| 2700 \| 2.2672 \| 0.3408 \|
	\| 0.0466 \| 123.91 \| 2850 \| 2.2532 \| 0.3525 \|
	\| 0.0424 \| 130.43 \| 3000 \| 2.2844 \| 0.3259 \|
	\| 0.0446 \| 136.96 \| 3150 \| 2.2763 \| 0.3253 \|
	\| 0.0411 \| 143.48 \| 3300 \| 2.3011 \| 0.3302 \|
	\| 0.0419 \| 150.0 \| 3450 \| 2.3201 \| 0.3420 \|
	\| 0.0333 \| 156.52 \| 3600 \| 2.3644 \| 0.3439 \|
	\| 0.0384 \| 163.04 \| 3750 \| 2.3685 \| 0.3532 \|
	\| 0.0367 \| 169.57 \| 3900 \| 2.3970 \| 0.3470 \|
	\| 0.0307 \| 176.09 \| 4050 \| 2.3530 \| 0.3309 \|
	\| 0.0328 \| 182.61 \| 4200 \| 2.3415 \| 0.3315 \|
	\| 0.0271 \| 189.13 \| 4350 \| 2.4165 \| 0.3309 \|
	\| 0.0213 \| 195.65 \| 4500 \| 2.4478 \| 0.3451 \|
	\| 0.0193 \| 202.17 \| 4650 \| 2.5241 \| 0.3556 \|
	\| 0.0204 \| 208.7 \| 4800 \| 2.5700 \| 0.3463 \|
	\| 0.0185 \| 215.22 \| 4950 \| 2.5837 \| 0.3178 \|
	\| 0.0161 \| 221.74 \| 5100 \| 2.5139 \| 0.3377 \|
	\| 0.0167 \| 228.26 \| 5250 \| 2.5288 \| 0.3352 \|
	\| 0.0148 \| 234.78 \| 5400 \| 2.5741 \| 0.3389 \|
	\| 0.0141 \| 241.3 \| 5550 \| 2.5174 \| 0.3389 \|
	\| 0.0122 \| 247.83 \| 5700 \| 2.5573 \| 0.3352 \|
	\| 0.0115 \| 254.35 \| 5850 \| 2.5790 \| 0.3296 \|
	\| 0.0141 \| 260.87 \| 6000 \| 2.5774 \| 0.3203 \|
	\| 0.0123 \| 267.39 \| 6150 \| 2.6147 \| 0.3309 \|
	\| 0.0214 \| 273.91 \| 6300 \| 2.6202 \| 0.3302 \|
	\| 0.0107 \| 280.43 \| 6450 \| 2.6264 \| 0.3234 \|
	\| 0.0086 \| 286.96 \| 6600 \| 2.6075 \| 0.3216 \|
	\| 0.0106 \| 293.48 \| 6750 \| 2.5960 \| 0.3247 \|
	\| 0.0085 \| 300.0 \| 6900 \| 2.5952 \| 0.3240 \|


	### Framework versions

	- Transformers 4.34.0
	- Pytorch 2.1.0+cu121
	- Datasets 2.14.5
	- Tokenizers 0.14.1

	---
	license: apache-2.0
	base_model: facebook/wav2vec2-large-xlsr-53
	tags:
	- automatic-speech-recognition
	- ./sample_speech.py
	- generated_from_trainer
	model-index:
	- name: ja-xlsr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ja-xlsr

	This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the ./SAMPLE_SPEECH.PY - NA dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.5952
	- Cer: 0.3240

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- num_epochs: 300

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Cer \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:------:\|
	\| 4.9138 \| 6.52 \| 150 \| 4.7965 \| 1.0 \|
	\| 4.7484 \| 13.04 \| 300 \| 4.6081 \| 1.0 \|
	\| 4.5894 \| 19.57 \| 450 \| 4.4697 \| 0.9851 \|
	\| 4.2024 \| 26.09 \| 600 \| 4.0373 \| 0.9077 \|
	\| 2.7314 \| 32.61 \| 750 \| 2.5507 \| 0.5341 \|
	\| 1.2293 \| 39.13 \| 900 \| 2.0146 \| 0.4139 \|
	\| 0.5544 \| 45.65 \| 1050 \| 1.9821 \| 0.3556 \|
	\| 0.3224 \| 52.17 \| 1200 \| 2.0190 \| 0.3587 \|
	\| 0.1951 \| 58.7 \| 1350 \| 2.1229 \| 0.3612 \|
	\| 0.1539 \| 65.22 \| 1500 \| 2.1114 \| 0.3470 \|
	\| 0.1165 \| 71.74 \| 1650 \| 2.2748 \| 0.3315 \|
	\| 0.1119 \| 78.26 \| 1800 \| 2.2391 \| 0.3488 \|
	\| 0.0989 \| 84.78 \| 1950 \| 2.3438 \| 0.3383 \|
	\| 0.0915 \| 91.3 \| 2100 \| 2.1218 \| 0.3587 \|
	\| 0.0721 \| 97.83 \| 2250 \| 2.2428 \| 0.3519 \|
	\| 0.0742 \| 104.35 \| 2400 \| 2.2293 \| 0.3364 \|
	\| 0.0629 \| 110.87 \| 2550 \| 2.2878 \| 0.3371 \|
	\| 0.0495 \| 117.39 \| 2700 \| 2.2672 \| 0.3408 \|
	\| 0.0466 \| 123.91 \| 2850 \| 2.2532 \| 0.3525 \|
	\| 0.0424 \| 130.43 \| 3000 \| 2.2844 \| 0.3259 \|
	\| 0.0446 \| 136.96 \| 3150 \| 2.2763 \| 0.3253 \|
	\| 0.0411 \| 143.48 \| 3300 \| 2.3011 \| 0.3302 \|
	\| 0.0419 \| 150.0 \| 3450 \| 2.3201 \| 0.3420 \|
	\| 0.0333 \| 156.52 \| 3600 \| 2.3644 \| 0.3439 \|
	\| 0.0384 \| 163.04 \| 3750 \| 2.3685 \| 0.3532 \|
	\| 0.0367 \| 169.57 \| 3900 \| 2.3970 \| 0.3470 \|
	\| 0.0307 \| 176.09 \| 4050 \| 2.3530 \| 0.3309 \|
	\| 0.0328 \| 182.61 \| 4200 \| 2.3415 \| 0.3315 \|
	\| 0.0271 \| 189.13 \| 4350 \| 2.4165 \| 0.3309 \|
	\| 0.0213 \| 195.65 \| 4500 \| 2.4478 \| 0.3451 \|
	\| 0.0193 \| 202.17 \| 4650 \| 2.5241 \| 0.3556 \|
	\| 0.0204 \| 208.7 \| 4800 \| 2.5700 \| 0.3463 \|
	\| 0.0185 \| 215.22 \| 4950 \| 2.5837 \| 0.3178 \|
	\| 0.0161 \| 221.74 \| 5100 \| 2.5139 \| 0.3377 \|
	\| 0.0167 \| 228.26 \| 5250 \| 2.5288 \| 0.3352 \|
	\| 0.0148 \| 234.78 \| 5400 \| 2.5741 \| 0.3389 \|
	\| 0.0141 \| 241.3 \| 5550 \| 2.5174 \| 0.3389 \|
	\| 0.0122 \| 247.83 \| 5700 \| 2.5573 \| 0.3352 \|
	\| 0.0115 \| 254.35 \| 5850 \| 2.5790 \| 0.3296 \|
	\| 0.0141 \| 260.87 \| 6000 \| 2.5774 \| 0.3203 \|
	\| 0.0123 \| 267.39 \| 6150 \| 2.6147 \| 0.3309 \|
	\| 0.0214 \| 273.91 \| 6300 \| 2.6202 \| 0.3302 \|
	\| 0.0107 \| 280.43 \| 6450 \| 2.6264 \| 0.3234 \|
	\| 0.0086 \| 286.96 \| 6600 \| 2.6075 \| 0.3216 \|
	\| 0.0106 \| 293.48 \| 6750 \| 2.5960 \| 0.3247 \|
	\| 0.0085 \| 300.0 \| 6900 \| 2.5952 \| 0.3240 \|


	### Framework versions

	- Transformers 4.34.0
	- Pytorch 2.1.0+cu121
	- Datasets 2.14.5
	- Tokenizers 0.14.1