wav2vec2-large-xlsr-common_voice_13_0-id
Note: do not recommended to try the model through this model card
Alternatively, try it through the available space click here Then you can addapt the inference method available in the gradio app script. Or you can checkout at my github repository click here
This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice_13_0 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4115
- Wer: 0.4316
Model description
The model is based on the facebook/wav2vec2-large-xlsr-53 architecture and fine-tuned for Automatic Speech Recognition on the common_voice_13_0 dataset in Indonesian (id). It is designed to transcribe spoken language into written text.
Intended uses & limitations
Intended Uses:
- Automatic Speech Recognition for Indonesian speech data.
- Transcription of spoken content in common_voice_13_0 dataset.
Limitations:
- The model's performance may vary on speech data outside the common_voice_13_0 dataset.
- It may not perform well on languages other than Indonesian.
Training and evaluation data
The model was trained on the common_voice_13_0 dataset, specifically using the Indonesian (id) split for evaluation.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
5.0656 | 2.88 | 400 | 2.7637 | 1.0 |
1.1404 | 5.76 | 800 | 0.4483 | 0.6088 |
0.3698 | 8.63 | 1200 | 0.4029 | 0.5278 |
0.2695 | 11.51 | 1600 | 0.3976 | 0.5036 |
0.2074 | 14.39 | 2000 | 0.3988 | 0.4793 |
0.1796 | 17.27 | 2400 | 0.3952 | 0.4590 |
0.1523 | 20.14 | 2800 | 0.3986 | 0.4463 |
0.1352 | 23.02 | 3200 | 0.4143 | 0.4374 |
0.121 | 25.9 | 3600 | 0.4022 | 0.4337 |
0.1085 | 28.78 | 4000 | 0.4115 | 0.4316 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0
- Downloads last month
- 51
Model tree for arifagustyawan/wav2vec2-large-xlsr-53-id
Base model
facebook/wav2vec2-large-xlsr-53