Whisper Small Mn - Erkhembayar Gantulga

This model is a fine-tuned version of openai/whisper-small on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1561
  • Wer: 19.4492

Training and evaluation data

Datasets used for training:

For training, combined Common Voice 17.0 and Google Fleurs datasets:

from datasets import load_dataset, DatasetDict, concatenate_datasets
from datasets import Audio

common_voice = DatasetDict()

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="train+validation+validated", use_auth_token=True)
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="test", use_auth_token=True)

common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))

common_voice = common_voice.remove_columns(
    ["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes", "variant"]
)

google_fleurs = DatasetDict()

google_fleurs["train"] = load_dataset("google/fleurs", "mn_mn", split="train+validation", use_auth_token=True)
google_fleurs["test"] = load_dataset("google/fleurs", "mn_mn", split="test", use_auth_token=True)

google_fleurs = google_fleurs.remove_columns(
    ["id", "num_samples", "path", "raw_transcription", "gender", "lang_id", "language", "lang_group_id"]
)
google_fleurs = google_fleurs.rename_column("transcription", "sentence")

dataset = DatasetDict()
dataset["train"] = concatenate_datasets([common_voice["train"], google_fleurs["train"]])
dataset["test"] = concatenate_datasets([common_voice["test"], google_fleurs["test"]])

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.4118 0.4912 500 0.4810 50.3500
0.283 0.9823 1000 0.3347 38.6233
0.1778 1.4735 1500 0.2738 33.5240
0.1412 1.9646 2000 0.2216 27.8363
0.0676 2.4558 2500 0.1967 24.3823
0.0602 2.9470 3000 0.1711 21.7428
0.0363 3.4381 3500 0.1624 20.4108
0.0332 3.9293 4000 0.1561 19.4492

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.3.1+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
18
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for erkhem-gantulga/whisper-small-mn

Finetuned
(1988)
this model

Datasets used to train erkhem-gantulga/whisper-small-mn