Cnam-LMSSC
/

wav2vec2-french-phonemizer

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

zinc75 commited on Nov 8, 2023

Commit

0e7f6a9

•

1 Parent(s): 801396b

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -34,5 +34,15 @@ model-index:
 # Fine-tuned French Voxpopuli v2 wav2vec2-base model for speech-to-phoneme task in French
-Fine-tuned [facebook/wav2vec2-base-fr-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-base-fr-voxpopuli-v2) on French using the train and validation splits of [Common Voice v13](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0).
 When using this model, make sure that your speech input is sampled at 16kHz.

 # Fine-tuned French Voxpopuli v2 wav2vec2-base model for speech-to-phoneme task in French
+Fine-tuned [facebook/wav2vec2-base-fr-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-base-fr-voxpopuli-v2) for **French speech-to-phoneme** using the train and validation splits of [Common Voice v13](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0).
+- The model has been trained for 14 epochs on 4 2080 Ti GPUs (256 audios per update, corresponding roughly to 25 minutes of speech per update -> 2k updates per epoch)
+- Learning rate schedule : Double Tri-state schedule
+    - Warmup from 1e-5 for 7% of total updates
+    - Constant at 1e-4 for 28% of total updates
+    - Linear decrease to 1e-6 for 36% of total updates
+    - Second warmup boost to 3e-5 for 3% of total updates
+    - Constant at 3e-5 for 12% of total updates
+    - Linear decrease to 1e-7 for remaining 14% of updates
 When using this model, make sure that your speech input is sampled at 16kHz.