File size: 4,091 Bytes
0dd5303 03b8829 7f8bf31 03b8829 45209c5 03b8829 45209c5 03b8829 45209c5 03b8829 29ef43f 03b8829 9953d4a 45209c5 9953d4a 45209c5 9953d4a 45209c5 03b8829 962644d 0e7f6a9 d404be6 4f7e2dd 4129c9e d404be6 4f7e2dd 0e7f6a9 828efdc d404be6 828efdc e6839c5 a6b4918 e6839c5 e3f3945 a6b4918 e6839c5 4185dba e6839c5 a6b4918 e6839c5 a6b4918 a382071 aae3cae e6839c5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: mit
language: fr
datasets:
- mozilla-foundation/common_voice_13_0
metrics:
- per
tags:
- audio
- automatic-speech-recognition
- speech
- phonemize
model-index:
- name: Wav2Vec2-base French finetuned for phonemes by LMSSC
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice v13
type: mozilla-foundation/common_voice_13_0
args: fr
metrics:
- name: Test PER on Common Voice FR 13.0 | Trained
type: per
value: 5.52
- name: Test PER on Multilingual Librispeech FR | Trained
type: per
value: 4.36
- name: Val PER on Common Voice FR 13.0 | Trained
type: per
value: 4.31
---
# Fine-tuned French Voxpopuli v2 wav2vec2-base model for speech-to-phoneme task in French
Fine-tuned [facebook/wav2vec2-base-fr-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-base-fr-voxpopuli-v2) for **French speech-to-phoneme** (without language model) using the train and validation splits of [Common Voice v13](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0).
## Audio samplerate for usage
When using this model, make sure that your speech input is **sampled at 16kHz**.
## Output
As this model is specifically trained for a speech-to-phoneme task, the output is sequence of [IPA-encoded](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) words, without punctuation.
If you don't read the phonetic alphabet fluently, you can use this excellent [IPA reader website](http://ipa-reader.xyz) to convert the transcript back to audio synthetic speech in order to check the quality of the phonetic transcription.
## Training procedure
The model has been finetuned on Coommonvoice-v13 (FR) for 14 epochs on 4x2080 Ti GPUs using a ddp strategy and gradient-accumulation procedure (256 audios per update, corresponding roughly to 25 minutes of speech per update -> 2k updates per epoch)
- Learning rate schedule : Double Tri-state schedule
- Warmup from 1e-5 for 7% of total updates
- Constant at 1e-4 for 28% of total updates
- Linear decrease to 1e-6 for 36% of total updates
- Second warmup boost to 3e-5 for 3% of total updates
- Constant at 3e-5 for 12% of total updates
- Linear decrease to 1e-7 for remaining 14% of updates
- The set of hyperparameters used for training are the same as those detailed in Annex B and Table 6 of [wav2vec2 paper](https://arxiv.org/pdf/2006.11477.pdf).
## Usage (with HuggingSound)
The model can be used directly using the [HuggingSound](https://github.com/jonatasgrosman/huggingsound) library:
```python
import pandas as pd
from huggingsound import SpeechRecognitionModel
model = SpeechRecognitionModel("Cnam-LMSSC/wav2vec2-french-phonemizer")
audio_paths = ["./test_relecture_texte.wav", "./10179_11051_000021.flac"]
# No need for the Audio files to be sampled at 16 kHz here, they are automatically resampled by Huggingsound
transcriptions = model.transcribe(audio_paths)
# (Optionnal) Display results in a table :
df = pd.DataFrame(transcriptions)
df['Audio file'] = pd.DataFrame(audio_paths)
df.set_index('Audio file', inplace=True)
df[['transcription']]
```
Output :
| | Audio file | transcription |
|---:|:---------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | ./test_relecture_texte.wav | ʃapitʁ di də abɛse pəti kɔ̃t də ʒyl ləmɛtʁ ɑ̃ʁʒistʁe puʁ libʁivɔksɔʁɡ ibis dɑ̃ la bas kuʁ dœ̃ ʃato sə tʁuva paʁmi tut sɔʁt də volaj œ̃n ibis ʁɔz |
| 1 | ./10179_11051_000021.flac | kɛl dɔmaʒ kə sə nə swa pa dy sykʁ supiʁa se foʁaz ɑ̃ pasɑ̃ sa lɑ̃ɡ syʁ la vitʁ fɛ̃ dy ʃapitʁ kɛ̃z ɑ̃ʁʒistʁe paʁ sonjɛ̃ sɛt ɑ̃ʁʒistʁəmɑ̃ fɛ paʁti dy domɛn pyblik |
## Usage |