erkhem-gantulga
/

whisper-medium-mn

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Erkhembayar Gantulga commited on Aug 27

Commit

1899cc9

•

1 Parent(s): 6700b86

Updated README

Added training data information

Files changed (1) hide show

README.md +39 -2

README.md CHANGED Viewed

@@ -3,6 +3,9 @@ language:
 - mn
 base_model: openai/whisper-medium
 library_name: transformers
 tags:
 - audio
 - automatic-speech-recognition
@@ -37,7 +40,7 @@ should probably proofread and complete it, then remove this comment. -->
 # Whisper Medium Mn - Erkhembayar Gantulga
-This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the Common Voice 17.0 dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.1083
 - Wer: 12.9580
@@ -52,7 +55,41 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 - mn
 base_model: openai/whisper-medium
 library_name: transformers
+datasets:
+- mozilla-foundation/common_voice_17_0
+- google/fleurs
 tags:
 - audio
 - automatic-speech-recognition
 # Whisper Medium Mn - Erkhembayar Gantulga
+This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the Common Voice 17.0 and Google Fleurs datasets.
 It achieves the following results on the evaluation set:
 - Loss: 0.1083
 - Wer: 12.9580
 ## Training and evaluation data
+Datasets used for training:
+- [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
+- [Google Fleurs](https://huggingface.co/datasets/google/fleurs)
+For training, combined Common Voice 17.0 and Google Fleurs datasets:
+```
+from datasets import load_dataset, DatasetDict, concatenate_datasets
+from datasets import Audio
+common_voice = DatasetDict()
+common_voice["train"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="train+validation+validated", use_auth_token=True)
+common_voice["test"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="test", use_auth_token=True)
+common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))
+common_voice = common_voice.remove_columns(
+    ["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes", "variant"]
+)
+google_fleurs = DatasetDict()
+google_fleurs["train"] = load_dataset("google/fleurs", "mn_mn", split="train+validation", use_auth_token=True)
+google_fleurs["test"] = load_dataset("google/fleurs", "mn_mn", split="test", use_auth_token=True)
+google_fleurs = google_fleurs.remove_columns(
+    ["id", "num_samples", "path", "raw_transcription", "gender", "lang_id", "language", "lang_group_id"]
+)
+google_fleurs = google_fleurs.rename_column("transcription", "sentence")
+dataset = DatasetDict()
+dataset["train"] = concatenate_datasets([common_voice["train"], google_fleurs["train"]])
+dataset["test"] = concatenate_datasets([common_voice["test"], google_fleurs["test"]])
+```
 ## Training procedure