Automatic Speech Recognition
Transformers
Safetensors
Welsh
English
wav2vec2
Inference Endpoints
DewiBrynJones commited on
Commit
8aaa6d9
1 Parent(s): 98fabba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -66
README.md CHANGED
@@ -3,11 +3,9 @@ license: apache-2.0
3
  base_model: facebook/wav2vec2-large-xlsr-53
4
  metrics:
5
  - wer
6
- model-index:
7
- - name: wav2vec2-xlsr-53-ft-ccv-en-cy
8
- results: []
9
  datasets:
10
  - techiaith/commonvoice_16_1_en_cy
 
11
  language:
12
  - cy
13
  - en
@@ -62,66 +60,3 @@ def transcribe(audio):
62
 
63
  transcribe(<path/or/url/to/any/audiofile>)
64
  ```
65
-
66
-
67
- ## Evaluation
68
-
69
-
70
- According to a balanced English+Welsh test set derived from Common Voice version 16.1, the WER of techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm is **23.79%**
71
-
72
- However, when evaluated with language specific test sets, the model exhibits a bias to perform better with Welsh.
73
-
74
- | Common Voice Test Set Language | WER | CER |
75
- | -------- | --- | --- |
76
- | EN+CY | 23.79| 9.68 |
77
- | EN | 34.47 | 14.83 |
78
- | CY | 12.34 | 3.55 |
79
-
80
-
81
- ## Training procedure
82
-
83
- ### Training hyperparameters
84
-
85
- The following hyperparameters were used during training:
86
- - learning_rate: 0.0003
87
- - train_batch_size: 32
88
- - eval_batch_size: 32
89
- - seed: 42
90
- - gradient_accumulation_steps: 2
91
- - total_train_batch_size: 64
92
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
93
- - lr_scheduler_type: linear
94
- - lr_scheduler_warmup_steps: 800
95
- - training_steps: 9000
96
- - mixed_precision_training: Native AMP
97
-
98
- ### Training results
99
-
100
- | Training Loss | Epoch | Step | Validation Loss | Wer |
101
- |:-------------:|:-----:|:----:|:---------------:|:------:|
102
- | 6.0574 | 0.25 | 500 | 2.0297 | 0.9991 |
103
- | 1.224 | 0.5 | 1000 | 0.5368 | 0.4342 |
104
- | 0.434 | 0.75 | 1500 | 0.4861 | 0.3891 |
105
- | 0.3295 | 1.01 | 2000 | 0.4301 | 0.3411 |
106
- | 0.2739 | 1.26 | 2500 | 0.3818 | 0.3053 |
107
- | 0.2619 | 1.51 | 3000 | 0.3894 | 0.3060 |
108
- | 0.2517 | 1.76 | 3500 | 0.3497 | 0.2802 |
109
- | 0.2244 | 2.01 | 4000 | 0.3519 | 0.2792 |
110
- | 0.1854 | 2.26 | 4500 | 0.3376 | 0.2718 |
111
- | 0.1779 | 2.51 | 5000 | 0.3206 | 0.2520 |
112
- | 0.1749 | 2.77 | 5500 | 0.3169 | 0.2535 |
113
- | 0.1636 | 3.02 | 6000 | 0.3122 | 0.2465 |
114
- | 0.137 | 3.27 | 6500 | 0.3054 | 0.2382 |
115
- | 0.1311 | 3.52 | 7000 | 0.2956 | 0.2280 |
116
- | 0.1261 | 3.77 | 7500 | 0.2898 | 0.2236 |
117
- | 0.1187 | 4.02 | 8000 | 0.2847 | 0.2176 |
118
- | 0.1011 | 4.27 | 8500 | 0.2763 | 0.2124 |
119
- | 0.0981 | 4.52 | 9000 | 0.2754 | 0.2115 |
120
-
121
-
122
- ### Framework versions
123
-
124
- - Transformers 4.38.2
125
- - Pytorch 2.2.1+cu121
126
- - Datasets 2.18.0
127
- - Tokenizers 0.15.2
 
3
  base_model: facebook/wav2vec2-large-xlsr-53
4
  metrics:
5
  - wer
 
 
 
6
  datasets:
7
  - techiaith/commonvoice_16_1_en_cy
8
+ - techiaith/banc-trawsgrifiadau-bangor
9
  language:
10
  - cy
11
  - en
 
60
 
61
  transcribe(<path/or/url/to/any/audiofile>)
62
  ```