spktsagar commited on
Commit
ce22495
1 Parent(s): e931718

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -54
README.md CHANGED
@@ -1,36 +1,10 @@
1
  ---
2
- language:
3
- - ne
4
- - np
5
  license: apache-2.0
6
  tags:
7
  - generated_from_trainer
8
- - automatic-speech-recognition
9
- - speech
10
- - openslr
11
- - nepali
12
- datasets:
13
- - spktsagar/openslr-nepali-asr-cleaned
14
- metrics:
15
- - wer
16
  model-index:
17
  - name: wav2vec2-large-xls-r-300m-nepali-openslr
18
- results:
19
- - task:
20
- type: automatic-speech-recognition
21
- name: Nepali Speech Recognition
22
- dataset:
23
- type: spktsagar/openslr-nepali-asr-cleaned
24
- name: OpenSLR Nepali ASR
25
- config: original
26
- split: train
27
- metrics:
28
- - type: were
29
- value: 24.05
30
- name: Test WER
31
- verified: false
32
-
33
-
34
  ---
35
 
36
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -38,42 +12,29 @@ should probably proofread and complete it, then remove this comment. -->
38
 
39
  # wav2vec2-large-xls-r-300m-nepali-openslr
40
 
41
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on an [OpenSLR Nepali ASR](https://huggingface.co/datasets/spktsagar/openslr-nepali-asr-cleaned) dataset.
42
  It achieves the following results on the evaluation set:
43
- - eval_loss: 0.1913
44
- - eval_wer: 0.2405
45
- - eval_runtime: 586.4075
46
- - eval_samples_per_second: 36.829
47
- - eval_steps_per_second: 4.604
48
- - epoch: 4.6
49
- - step: 17600
50
 
51
  ## Model description
52
 
53
- Wav2Vec2 is a pretrained model for Automatic Speech Recognition (ASR) and was released in September 2020 by Alexei Baevski, Michael Auli, and Alex Conneau. Soon after the superior performance of Wav2Vec2 was demonstrated on one of the most popular English datasets for ASR, called LibriSpeech, Facebook AI presented a multi-lingual version of Wav2Vec2, called XLSR. XLSR stands for cross-lingual speech representations and refers to model's ability to learn speech representations that are useful across multiple languages.
54
-
55
- ## How to use?
56
- 1. Install transformers and librosa
57
- ```
58
- pip install librosa, transformers
59
- ```
60
- 2. Run the following code which loads your audio file, preprocessor, models, and returns your prediction
61
- ```python
62
- import librosa
63
- from transformers import pipeline
64
-
65
- audio, sample_rate = librosa.load("<path to your audio file>", sr=16000)
66
- recognizer = pipeline("automatic-speech-recognition", model="spktsagar/wav2vec2-large-xls-r-300m-nepali-openslr")
67
- prediction = recognizer(audio)
68
- ```
69
 
70
  ## Intended uses & limitations
71
 
72
- The model is trained on the OpenSLR Nepali ASR dataset, which in itself has some incorrect transcriptions, so it is obvious that the model will not have perfect predictions for your transcript. Similarly, due to colab's resource limit utterances longer than 5 sec are filtered out from the dataset during training and evaluation. Hence, the model might not perform as expected when given audio input longer than 5 sec.
 
 
73
 
74
- ## Training and evaluation data and Training procedure
75
 
76
- For dataset preparation and training code, please consult [my blog](https://sagar-spkt.github.io/posts/2022/08/finetune-xlsr-nepali/).
77
 
78
  ### Training hyperparameters
79
 
 
1
  ---
 
 
 
2
  license: apache-2.0
3
  tags:
4
  - generated_from_trainer
 
 
 
 
 
 
 
 
5
  model-index:
6
  - name: wav2vec2-large-xls-r-300m-nepali-openslr
7
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
12
 
13
  # wav2vec2-large-xls-r-300m-nepali-openslr
14
 
15
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
+ - eval_loss: 0.1848
18
+ - eval_wer: 0.2352
19
+ - eval_runtime: 585.979
20
+ - eval_samples_per_second: 36.856
21
+ - eval_steps_per_second: 4.608
22
+ - epoch: 5.02
23
+ - step: 19200
24
 
25
  ## Model description
26
 
27
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Intended uses & limitations
30
 
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
 
35
+ More information needed
36
 
37
+ ## Training procedure
38
 
39
  ### Training hyperparameters
40