File size: 4,107 Bytes
29a8690
61b1321
 
 
 
29a8690
 
61b1321
 
7def566
29a8690
da20fc1
6ee1e1a
da20fc1
61b1321
 
29a8690
 
 
61b1321
 
 
 
 
 
 
 
9af0340
 
 
61b1321
 
 
9af0340
 
29a8690
 
 
 
df186e5
cc7a050
29a8690
 
 
399c972
29a8690
 
 
 
1ab7485
 
 
29a8690
1ab7485
29a8690
1ab7485
 
29a8690
 
1ab7485
29a8690
 
1ab7485
399c972
 
 
 
 
1ab7485
29a8690
 
 
1ab7485
29a8690
 
 
 
 
 
 
 
 
 
1ab7485
29a8690
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
language:
- ar
- multilingual

license: apache-2.0
tags:
- automatic-speech-recognition
- hf-asr-leaderboard
- whisper-event
- generated_from_trainer
- Arabic
- multilingual
- STT
datasets:
- mozilla-foundation/common_voice_12_0
metrics:
- wer
model-index:

- name: Kalemat-Tech Arabic Speech Recognition Model (STT)
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      type: mozilla-foundation/common_voice_12_0
      name: mozilla-foundation/common_voice_12_0
      config: ar
      split: test
      args: ar
    metrics:
    - type: wer
      value: 58.5848
      name: wer
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Kalemat-Tech Arabic Speech Recognition Model (STT) - Mohamed Salama
# نموذج كلماتك للتعرف على الأصوات العربية الفصحى و تحويلها إلى نصوص

# KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on Common_Voice_Arabic_12.0_Augmented.
It achieves the following results on the evaluation set:
- Loss: 0.5362
- Wer: 58.5848

## Example of usage:
```
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

processor = AutoProcessor.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")

model = AutoModelForSpeechSeq2Seq.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")
```
## Intended uses & limitations

Automatic Speech Recognition

## Training and evaluation data
```
Common_Voice_Arabic_12.0 and I made some augmentations to it as follows:
- 25% of the data TimeMasking
- 25% of the data SpecAugmentation
- 25% of the data WavAugmentation (AddGaussianNoise)
- The final dataset is the original common voice plus the augmented files
```
## Training procedure

### Training hyperparameters
```
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 25
- mixed_precision_training: Native AMP
```
### Training results

| Training Loss | Epoch | Step  | Validation Loss | Wer     |
|:-------------:|:-----:|:-----:|:---------------:|:-------:|
| 0.2728        | 1.01  | 1000  | 0.3063          | 60.4733 |
| 0.1442        | 2.01  | 2000  | 0.2878          | 55.6935 |
| 0.0648        | 3.02  | 3000  | 0.3009          | 59.2568 |
| 0.0318        | 4.03  | 4000  | 0.3278          | 59.2993 |
| 0.0148        | 5.04  | 5000  | 0.3539          | 61.0364 |
| 0.0088        | 6.04  | 6000  | 0.3714          | 56.9154 |
| 0.0061        | 7.05  | 7000  | 0.3920          | 57.5515 |
| 0.0041        | 8.06  | 8000  | 0.4149          | 61.6328 |
| 0.0033        | 9.06  | 9000  | 0.4217          | 58.0310 |
| 0.0033        | 10.07 | 10000 | 0.4376          | 59.9594 |
| 0.0021        | 11.08 | 11000 | 0.4485          | 56.7812 |
| 0.0015        | 12.08 | 12000 | 0.4577          | 57.6936 |
| 0.0013        | 13.09 | 13000 | 0.4671          | 60.6606 |
| 0.0011        | 14.1  | 14000 | 0.4686          | 59.8159 |
| 0.0008        | 15.11 | 15000 | 0.4856          | 60.7111 |
| 0.0011        | 16.11 | 16000 | 0.4851          | 59.5198 |
| 0.0005        | 17.12 | 17000 | 0.4936          | 59.2608 |
| 0.0004        | 18.13 | 18000 | 0.4995          | 57.9619 |
| 0.0003        | 19.13 | 19000 | 0.5085          | 58.3630 |
| 0.0002        | 20.14 | 20000 | 0.5155          | 58.0987 |
| 0.0001        | 21.15 | 21000 | 0.5251          | 58.8504 |
| 0.0001        | 22.16 | 22000 | 0.5268          | 58.4228 |
| 0.0001        | 23.16 | 23000 | 0.5317          | 59.0881 |
| 0.0001        | 24.17 | 24000 | 0.5362          | 58.5848 |


### Framework versions

- Transformers 4.25.1
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.2