Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia as well as a copy of the dataset with noise reduction and normalization (for both train and test) dataset. The datasets were processed with noise reduction and normalization (both the train and test splits). It achieves the following results on the evaluation set:

  • Loss: 1.3339
  • Bleu: 30.66
  • Chrf: 46.99
  • Wer: 65.4660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.01
  • training_steps: 3000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
1.41 0.07 100 9.78 25.23 1.8782 96.3980
1.2436 0.13 200 10.23 28.66 1.8301 125.9343
1.593 0.2 300 9.53 30.7 1.7066 137.1454
1.9589 0.26 400 12.08 32.94 1.5629 109.3652
1.8174 0.33 500 13.73 34.5 1.5154 123.5930
1.6775 0.39 600 15.8 35.68 1.5220 102.2062
1.7074 0.46 700 16.62 37.96 1.4570 100.5853
1.5793 0.53 800 24.5 39.91 1.4265 71.3643
1.3708 0.59 900 24.35 42.26 1.3845 73.7956
1.3217 0.66 1000 19.34 41.3 1.3662 87.7533
1.2572 0.72 1100 21.59 41.35 1.3529 88.4286
1.1447 0.79 1200 28.39 44.99 1.3228 65.9163
1.1544 0.85 1300 23.69 43.07 1.2972 80.1891
1.0291 0.92 1400 29.36 45.45 1.2828 70.9590
0.9394 0.98 1500 26.44 44.0 1.2812 74.1558
0.3764 1.05 1600 26.95 44.82 1.3248 73.8406
0.3338 1.12 1700 26.5 44.96 1.3212 77.3976
0.3148 1.18 1800 29.57 46.31 1.3188 66.7267
0.3206 1.25 1900 30.87 47.21 1.3050 64.4755
0.3069 1.31 2000 30.15 46.19 1.3053 65.6911
0.3342 1.38 2100 1.3506 24.14 44.12 77.2625
0.3125 1.44 2200 1.3369 30.21 46.08 63.9802
0.319 1.51 2300 1.3601 27.71 45.45 69.9235
0.3067 1.58 2400 1.3473 26.92 45.73 69.3381
0.2621 1.64 2500 1.3354 28.36 46.14 66.9068
0.2709 1.71 2600 1.3339 28.75 45.47 65.2859
0.2644 1.77 2700 1.3100 28.84 47.35 65.8262
0.2511 1.84 2800 1.3261 29.41 47.31 69.4732
0.2232 1.9 2900 1.3382 30.79 46.63 64.1153
0.236 1.97 3000 1.3339 30.66 46.99 65.4660

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
35
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-small-ga2en-v4

Finetuned
(2019)
this model

Datasets used to train ymoslem/whisper-small-ga2en-v4

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia, normalized
    self-reported
    30.660
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia, normalized
    self-reported
    65.466