Erkhembayar Gantulga
commited on
Commit
•
1899cc9
1
Parent(s):
6700b86
Updated README
Browse filesAdded training data information
README.md
CHANGED
@@ -3,6 +3,9 @@ language:
|
|
3 |
- mn
|
4 |
base_model: openai/whisper-medium
|
5 |
library_name: transformers
|
|
|
|
|
|
|
6 |
tags:
|
7 |
- audio
|
8 |
- automatic-speech-recognition
|
@@ -37,7 +40,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
37 |
|
38 |
# Whisper Medium Mn - Erkhembayar Gantulga
|
39 |
|
40 |
-
This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the Common Voice 17.0
|
41 |
It achieves the following results on the evaluation set:
|
42 |
- Loss: 0.1083
|
43 |
- Wer: 12.9580
|
@@ -52,7 +55,41 @@ More information needed
|
|
52 |
|
53 |
## Training and evaluation data
|
54 |
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
## Training procedure
|
58 |
|
|
|
3 |
- mn
|
4 |
base_model: openai/whisper-medium
|
5 |
library_name: transformers
|
6 |
+
datasets:
|
7 |
+
- mozilla-foundation/common_voice_17_0
|
8 |
+
- google/fleurs
|
9 |
tags:
|
10 |
- audio
|
11 |
- automatic-speech-recognition
|
|
|
40 |
|
41 |
# Whisper Medium Mn - Erkhembayar Gantulga
|
42 |
|
43 |
+
This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the Common Voice 17.0 and Google Fleurs datasets.
|
44 |
It achieves the following results on the evaluation set:
|
45 |
- Loss: 0.1083
|
46 |
- Wer: 12.9580
|
|
|
55 |
|
56 |
## Training and evaluation data
|
57 |
|
58 |
+
Datasets used for training:
|
59 |
+
- [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
|
60 |
+
- [Google Fleurs](https://huggingface.co/datasets/google/fleurs)
|
61 |
+
|
62 |
+
For training, combined Common Voice 17.0 and Google Fleurs datasets:
|
63 |
+
|
64 |
+
```
|
65 |
+
from datasets import load_dataset, DatasetDict, concatenate_datasets
|
66 |
+
from datasets import Audio
|
67 |
+
|
68 |
+
common_voice = DatasetDict()
|
69 |
+
|
70 |
+
common_voice["train"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="train+validation+validated", use_auth_token=True)
|
71 |
+
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="test", use_auth_token=True)
|
72 |
+
|
73 |
+
common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))
|
74 |
+
|
75 |
+
common_voice = common_voice.remove_columns(
|
76 |
+
["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes", "variant"]
|
77 |
+
)
|
78 |
+
|
79 |
+
google_fleurs = DatasetDict()
|
80 |
+
|
81 |
+
google_fleurs["train"] = load_dataset("google/fleurs", "mn_mn", split="train+validation", use_auth_token=True)
|
82 |
+
google_fleurs["test"] = load_dataset("google/fleurs", "mn_mn", split="test", use_auth_token=True)
|
83 |
+
|
84 |
+
google_fleurs = google_fleurs.remove_columns(
|
85 |
+
["id", "num_samples", "path", "raw_transcription", "gender", "lang_id", "language", "lang_group_id"]
|
86 |
+
)
|
87 |
+
google_fleurs = google_fleurs.rename_column("transcription", "sentence")
|
88 |
+
|
89 |
+
dataset = DatasetDict()
|
90 |
+
dataset["train"] = concatenate_datasets([common_voice["train"], google_fleurs["train"]])
|
91 |
+
dataset["test"] = concatenate_datasets([common_voice["test"], google_fleurs["test"]])
|
92 |
+
```
|
93 |
|
94 |
## Training procedure
|
95 |
|