Update README.md
Browse files
README.md
CHANGED
@@ -91,4 +91,63 @@ df[['transcription']]
|
|
91 |
| ./test_relecture_texte.wav | ʃapitʁ di də abɛse pəti kɔ̃t də ʒyl ləmɛtʁ ɑ̃ʁʒistʁe puʁ libʁivɔksɔʁɡ ibis dɑ̃ la bas kuʁ dœ̃ ʃato sə tʁuva paʁmi tut sɔʁt də volaj œ̃n ibis ʁɔz |
|
92 |
| ./10179_11051_000021.flac | kɛl dɔmaʒ kə sə nə swa pa dy sykʁ supiʁa se foʁaz ɑ̃ pasɑ̃ sa lɑ̃ɡ syʁ la vitʁ fɛ̃ dy ʃapitʁ kɛ̃z ɑ̃ʁʒistʁe paʁ sonjɛ̃ sɛt ɑ̃ʁʒistʁəmɑ̃ fɛ paʁti dy domɛn pyblik |
|
93 |
|
94 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
| ./test_relecture_texte.wav | ʃapitʁ di də abɛse pəti kɔ̃t də ʒyl ləmɛtʁ ɑ̃ʁʒistʁe puʁ libʁivɔksɔʁɡ ibis dɑ̃ la bas kuʁ dœ̃ ʃato sə tʁuva paʁmi tut sɔʁt də volaj œ̃n ibis ʁɔz |
|
92 |
| ./10179_11051_000021.flac | kɛl dɔmaʒ kə sə nə swa pa dy sykʁ supiʁa se foʁaz ɑ̃ pasɑ̃ sa lɑ̃ɡ syʁ la vitʁ fɛ̃ dy ʃapitʁ kɛ̃z ɑ̃ʁʒistʁe paʁ sonjɛ̃ sɛt ɑ̃ʁʒistʁəmɑ̃ fɛ paʁti dy domɛn pyblik |
|
93 |
|
94 |
+
## Inference script (if you do not want to use Huggingsound) :
|
95 |
+
|
96 |
+
```python
|
97 |
+
import torch
|
98 |
+
from transformers import AutoModelForCTC, Wav2Vec2Processor
|
99 |
+
from datasets import load_dataset
|
100 |
+
import soundfile as sf # Or Librosa if you prefer to ...
|
101 |
+
|
102 |
+
MODEL_ID = "Cnam-LMSSC/wav2vec2-french-phonemizer"
|
103 |
+
|
104 |
+
model = AutoModelForCTC.from_pretrained(MODEL_ID)
|
105 |
+
processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)
|
106 |
+
|
107 |
+
audio = sf.read('example.wav')
|
108 |
+
# Make sure you have a 16 kHz sampled audio file, or resample it !
|
109 |
+
|
110 |
+
inputs = processor(np.array(audio[0]),sampling_rate=16_000., return_tensors="pt")
|
111 |
+
|
112 |
+
with torch.no_grad():
|
113 |
+
logits = model(**inputs).logits
|
114 |
+
|
115 |
+
predicted_ids = torch.argmax(logits,dim = -1)
|
116 |
+
transcription = processor.batch_decode(predicted_ids)
|
117 |
+
|
118 |
+
print("Phonetic transcription : ", transcription)
|
119 |
+
```
|
120 |
+
|
121 |
+
**Output** :
|
122 |
+
|
123 |
+
'ʒə syi tʁɛ kɔ̃tɑ̃ də vu pʁezɑ̃te notʁ solysjɔ̃ puʁ fonomize dez odjo fasilmɑ̃ sa fɔ̃ksjɔn kɑ̃ mɛm tʁɛ bjɛ̃'
|
124 |
+
|
125 |
+
##Test Results##:
|
126 |
+
|
127 |
+
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-06-17). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
|
128 |
+
|
129 |
+
| Model | WER | CER |
|
130 |
+
| ------------- | ------------- | ------------- |
|
131 |
+
| jonatasgrosman/wav2vec2-large-xlsr-53-english | **18.98%** | **8.29%** |
|
132 |
+
| jonatasgrosman/wav2vec2-large-english | 21.53% | 9.66% |
|
133 |
+
| facebook/wav2vec2-large-960h-lv60-self | 22.03% | 10.39% |
|
134 |
+
| facebook/wav2vec2-large-960h-lv60 | 23.97% | 11.14% |
|
135 |
+
| boris/xlsr-en-punctuation | 29.10% | 10.75% |
|
136 |
+
| facebook/wav2vec2-large-960h | 32.79% | 16.03% |
|
137 |
+
| facebook/wav2vec2-base-960h | 39.86% | 19.89% |
|
138 |
+
| facebook/wav2vec2-base-100h | 51.06% | 25.06% |
|
139 |
+
| elgeish/wav2vec2-large-lv60-timit-asr | 59.96% | 34.28% |
|
140 |
+
| facebook/wav2vec2-base-10k-voxpopuli-ft-en | 66.41% | 36.76% |
|
141 |
+
| elgeish/wav2vec2-base-timit-asr | 68.78% | 36.81% |
|
142 |
+
|
143 |
+
## Citation
|
144 |
+
If you want to cite this model you can use this:
|
145 |
+
|
146 |
+
```bibtex
|
147 |
+
@misc{lmssc-wav2vec2-base-phonemizer-french,
|
148 |
+
title={Fine-tuned wav2vec2 base model for speech to phoneme in {F}rench},
|
149 |
+
author={Malo, Olivier and Julien, Hauret and {\'E}ric, Bavu},
|
150 |
+
howpublished={\url{https://huggingface.co/Cnam-LMSSC/wav2vec2-french-phonemizer}},
|
151 |
+
year={2023}
|
152 |
+
}
|
153 |
+
```
|