Clementapa commited on
Commit
1e8d6bc
1 Parent(s): e0048f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -7
README.md CHANGED
@@ -1,7 +1,63 @@
1
- Wav2vec2 base fine tuned on phoneme recognition task for the dutch language common voice dataset 6.1 . From https://github.com/ASR-project/Multilingual-PR project.<br />
2
- Validation PER: 16.18 <br />
3
- Test PER: 20.83 <br />
4
- language: Dutch<br />
5
- tags: Phoneme recoginition, wav2vec2<br />
6
- datasets: common voice 6.1 <br />
7
- metrics: PER: Phoneme Error Rate<br />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: nl
3
+ datasets:
4
+ - common_voice
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - phoneme-recognition
9
+ widget:
10
+ - example_title: Librispeech sample 1
11
+ src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
12
+ - example_title: Librispeech sample 2
13
+ src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
14
+ model-index:
15
+ - name: wav2vec2-base-960h-phoneme-reco-dutch
16
+ results:
17
+ - task:
18
+ name: Automatic Phoneme Recognition
19
+ type: automatic-phoneme-recognition
20
+ dataset:
21
+ name: CommonVoice (clean)
22
+ type: librispeech_asr
23
+ config: clean
24
+ split: test
25
+ args:
26
+ language: nl
27
+ metrics:
28
+ - name: Test PER
29
+ type: per
30
+ value: 20.83
31
+ - name: Val PER
32
+ type: per
33
+ value: 16.18
34
+ ---
35
+
36
+ The Wav2vec2 base model [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h) fine tuned on phoneme recognition task for the dutch language.
37
+
38
+ # Usage
39
+
40
+ To transcribe in phonemes audio files the model can be used as a standalone acoustic model as follows:
41
+
42
+ ```python
43
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
44
+ from datasets import load_dataset
45
+ import torch
46
+
47
+ # load model and tokenizer
48
+ processor = Wav2Vec2Processor.from_pretrained("Clementapa/wav2vec2-base-960h-phoneme-reco-dutch")
49
+ model = Wav2Vec2ForCTC.from_pretrained("Clementapa/wav2vec2-base-960h-phoneme-reco-dutch")
50
+
51
+ # load dummy dataset and read soundfiles
52
+ ds = load_dataset("common_voice", "nl", split="validation")
53
+
54
+ # tokenize
55
+ input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values # Batch size 1
56
+
57
+ # retrieve logits
58
+ logits = model(input_values).logits
59
+
60
+ # take argmax and decode
61
+ predicted_ids = torch.argmax(logits, dim=-1)
62
+ transcription = processor.batch_decode(predicted_ids)
63
+ ```