File size: 3,985 Bytes
1126a4c ee3f45f 1126a4c ee3f45f 1126a4c ee3f45f 748e3fc 1126a4c ee3f45f 8101f05 ee3f45f 64ef1f7 748e3fc ee3f45f 8101f05 ee3f45f 8101f05 64ef1f7 8101f05 1126a4c a4182c5 1126a4c ee3f45f 1126a4c 8d18b6c a4182c5 ee3f45f 4c00fe0 a4182c5 1126a4c a4182c5 8d18b6c a4182c5 8d18b6c a4182c5 1126a4c a4182c5 4c00fe0 a4182c5 1126a4c 4c00fe0 a4182c5 1126a4c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
---
language:
- uz
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
- google/fleurs
metrics:
- wer
model-index:
- name: Whisper Small Uzbek
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: mozilla-foundation/common_voice_11_0
type: mozilla-foundation/common_voice_11_0
config: uz
split: test
args: da
metrics:
- type: wer
value: 23.650914047642605
name: Wer
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: google/fleurs
type: google/fleurs
config: uz_uz
split: test
metrics:
- type: wer
value: 47.15
name: WER
---
<!-- Disclaimer: I've never written a model card before. I'm probably not correctly following standard practices on how they should be written.
I'm new to this. I'm sorry -->
# Whisper Small Uzbek
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) trained and evaluated on the mozilla-foundation/common_voice_11_0 uz and google/fleurs uz_uz datasets.
It achieves the following results on the common_voice_11_0 evaluation set:
- Loss: 0.3872
- Wer: 23.6509
It achieves the following results on the FLEURS evaluation set:
- Wer: 47.15
## Model description
This model was created as part of the Whisper fine-tune sprint event.
Based on eval, this model achieves a WER of 23.6509 against the Common Voice 11 dataset and 47.15 against the FLEURS dataset.
This is a significant improvement over the smallest reported WER of 90.2 for the Uzbek language recorded on the [Whisper article](https://cdn.openai.com/papers/whisper.pdf):
![A part of Table 13 from the paper "Robust Speech Recognition via Large-Scale Weak Supervision", which shows the WER achieved by the Whisper model under the FLEURS dataset. Highlighted is the best score it achieved under for the Uzbek language, which was 90.2.](https://huggingface.co/BlueRaccoon/whisper-small-uz/resolve/main/uzbektable13.png)
## Intended uses & limitations
More information needed
## Training and evaluation data
Training was performed using the train and evaluation splits from [Mozilla's Common Voice 11](https://huggingface.co/mozilla-foundation/common_voice_11_0) and [Google's FLEURS](https://huggingface.co/google/fleurs) datasets.
Testing was performed using the test splits from the same datasets.
## Training procedure
Training and CV11 testing was performed using a modified version of Hugging Face's [run_speech_recognition_seq2seq_streaming.py](https://github.com/kamfonas/whisper-fine-tuning-event/blob/e0377f55004667f18b37215d11bf0e54f5bda463/run_speech_recognition_seq2seq_streaming.py) script by Michael Kamfonas.
FLEURS testing was performed using the standard [run_eval_whisper_streaming.py](https://github.com/huggingface/community-events/blob/main/whisper-fine-tuning-event/run_eval_whisper_streaming.py) script.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 400
- training_steps: 5000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.1542 | 0.2 | 1000 | 0.4711 | 30.8413 |
| 0.0976 | 0.4 | 2000 | 0.4040 | 26.6464 |
| 0.1088 | 1.0 | 3000 | 0.3765 | 24.4952 |
| 0.0527 | 1.21 | 4000 | 0.3872 | 23.6509 |
| 0.0534 | 1.41 | 5000 | 0.3843 | 23.6817 |
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2
|