Clementapa
/

wav2vec2-base-960h-phoneme-reco-dutch

Automatic Speech Recognition

phoneme-recognition

Inference Endpoints

Model card Files Files and versions Community

Clementapa commited on Oct 11, 2022

Commit

1e8d6bc

•

1 Parent(s): e0048f9

Update README.md

Files changed (1) hide show

README.md +63 -7

README.md CHANGED Viewed

@@ -1,7 +1,63 @@
-Wav2vec2 base fine tuned on phoneme recognition task for the dutch language common voice dataset 6.1 . From https://github.com/ASR-project/Multilingual-PR project.<br />
-Validation PER: 16.18 <br />
-Test PER: 20.83 <br />
-language: Dutch<br />
-tags: Phoneme recoginition, wav2vec2<br />
-datasets: common voice 6.1 <br />
-metrics: PER: Phoneme Error Rate<br />

+---
+language: nl
+datasets:
+- common_voice
+tags:
+- audio
+- automatic-speech-recognition
+- phoneme-recognition
+widget:
+- example_title: Librispeech sample 1
+  src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
+- example_title: Librispeech sample 2
+  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
+model-index:
+- name: wav2vec2-base-960h-phoneme-reco-dutch
+  results:
+  - task:
+      name: Automatic Phoneme Recognition
+      type: automatic-phoneme-recognition
+    dataset:
+      name: CommonVoice (clean)
+      type: librispeech_asr
+      config: clean
+      split: test
+      args:
+        language: nl
+    metrics:
+    - name: Test PER
+      type: per
+      value: 20.83
+    - name: Val PER
+      type: per
+      value: 16.18
+---
+The Wav2vec2 base model [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h) fine tuned on phoneme recognition task for the dutch language.
+# Usage
+To transcribe in phonemes audio files the model can be used as a standalone acoustic model as follows:
+```python
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
+ from datasets import load_dataset
+ import torch
+ # load model and tokenizer
+ processor = Wav2Vec2Processor.from_pretrained("Clementapa/wav2vec2-base-960h-phoneme-reco-dutch")
+ model = Wav2Vec2ForCTC.from_pretrained("Clementapa/wav2vec2-base-960h-phoneme-reco-dutch")
+ # load dummy dataset and read soundfiles
+ ds = load_dataset("common_voice", "nl", split="validation")
+ # tokenize
+ input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values  # Batch size 1
+ # retrieve logits
+ logits = model(input_values).logits
+ # take argmax and decode
+ predicted_ids = torch.argmax(logits, dim=-1)
+ transcription = processor.batch_decode(predicted_ids)
+ ```