Update README.md
Browse files
README.md
CHANGED
@@ -17,11 +17,11 @@ model-index:
|
|
17 |
metrics:
|
18 |
- name: Test WER
|
19 |
type: wer
|
20 |
-
value:
|
21 |
---
|
22 |
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija
|
23 |
|
24 |
-
**wav2vec2-large-xlsr-53** fine-tuned on
|
25 |
|
26 |
I have also added 3 phonetic units to this model ڭ, ڤ and پ. For example: ڭال , ڤيديو , پودكاست
|
27 |
|
@@ -59,7 +59,17 @@ print(transcription)
|
|
59 |
|
60 |
Here's the output: ڭالت ليا هاد السيد هادا ما كاينش بحالو
|
61 |
|
62 |
-
## Evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
|
64 |
**Wer**: 49.68
|
65 |
|
@@ -67,10 +77,10 @@ Here's the output: ڭالت ليا هاد السيد هادا ما كاينش ب
|
|
67 |
|
68 |
**Validation Loss**: 45.24
|
69 |
|
70 |
-
This high validation loss value is mainly due to the fact that Darija can be written in many ways.
|
71 |
-
|
72 |
## Future Work
|
73 |
|
74 |
-
|
|
|
|
|
75 |
|
76 |
|
|
|
17 |
metrics:
|
18 |
- name: Test WER
|
19 |
type: wer
|
20 |
+
value: 44.30
|
21 |
---
|
22 |
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija
|
23 |
|
24 |
+
**wav2vec2-large-xlsr-53** fine-tuned on 8.5 hours of labeled Darija Audios
|
25 |
|
26 |
I have also added 3 phonetic units to this model ڭ, ڤ and پ. For example: ڭال , ڤيديو , پودكاست
|
27 |
|
|
|
59 |
|
60 |
Here's the output: ڭالت ليا هاد السيد هادا ما كاينش بحالو
|
61 |
|
62 |
+
## Evaluation & Previous works
|
63 |
+
|
64 |
+
-v2 (fine-tuned on 8.5 hours of audio + replacing أ and ى and إ with ا + tried to standardize the Moroccan Darija)
|
65 |
+
|
66 |
+
**Wer**: 44.30
|
67 |
+
|
68 |
+
**Training Loss**: 12.99
|
69 |
+
|
70 |
+
**Validation Loss**: 36.93
|
71 |
+
|
72 |
+
-v1 (fine-tuned on 6 hours of audio)
|
73 |
|
74 |
**Wer**: 49.68
|
75 |
|
|
|
77 |
|
78 |
**Validation Loss**: 45.24
|
79 |
|
|
|
|
|
80 |
## Future Work
|
81 |
|
82 |
+
I am currently working on improving this model. The new model will be available soon.
|
83 |
+
|
84 |
+
email: souregh@gmail.com
|
85 |
|
86 |
|