perisolb commited on
Commit
e55e24e
1 Parent(s): 440f14f

Correct typos

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -30,7 +30,7 @@ model-index:
30
  ---
31
 
32
  # Norwegian Wav2Vec2 Model - 300M - VoxRex - Bokmål
33
- This model is finetuned on top of feature extractor [VoxRex-model](https://huggingface.co/KBLab/wav2vec2-large-voxrex) from the National Library of Sweeden. The finetuned model achieves the following results on the test set with a 5-gram KenLM. Numbers in parenthesis are without the language model:
34
  - **WER: 0.0703** (0.0979)
35
  - **CER: 0.0269** (0.0311)
36
 
@@ -43,10 +43,10 @@ This is one of several Wav2Vec-models our team created during the 🤗 hosted [R
43
  | NbAiLab/nb-wav2vec2-300m-bokmaal (this model) | 7.03 | |
44
  | [NbAiLab/nb-wav2vec2-300m-nynorsk](https://huggingface.co/NbAiLab/nb-wav2vec2-300m-nynorsk) | 12.22 | |
45
  ## Dataset
46
- In parallell with the event, the team also converted the [Norwegian Parliamentary Speech Corpus (NPSC)](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-58/) to the [NbAiLab/NPSC](https://huggingface.co/datasets/NbAiLab/NPSC) in 🤗 Dataset format and used that as the main source for training.
47
 
48
  ## Code
49
- We do release all code developed during the event so that the Norwegian NLP community can build upon this to develop even better Norwegian ASR models. The finetuning of these models are not very compute demanding. You should after following the instructions here, be able to train your own automatic speech recognition system in less than a day with an average GPU.
50
 
51
  ## Team
52
  The following people contributed to building this model: Rolv-Arild Braaten, Per Egil Kummervold, Andre Kåsen, Javier de la Rosa, Per Erik Solberg, and Freddy Wetjen.
@@ -54,10 +54,10 @@ The following people contributed to building this model: Rolv-Arild Braaten, Per
54
  ## Training procedure
55
  To reproduce these results, we strongly recommend that you follow the [instructions from 🤗](https://github.com/huggingface/transformers/tree/master/examples/research_projects/robust-speech-event#talks) to train a simple Swedish model.
56
 
57
- When you have verified that you are able to do this, create a fresh new repo. You can then start by copying the files ```run.sh``` and ```run_speech_recognition_ctc.py``` from our repo. Running this will create all the other necessary files, and should let you reproduce our results. With some tweaks to the hyperparameters, you might even be able to build an even better ASR. Good luck!
58
 
59
  ### Language Model
60
- As you see from the results above, adding even a simple 5-gram language will improve the results. 🤗 has provided another [very nice blog](https://huggingface.co/blog/wav2vec2-with-ngram) about how to add a 5-gram language model to improve the ASR model. You can build this from your own corpus, for instance by extracting some suitable text from the [Norwegian Colossal Corpus](https://huggingface.co/datasets/NbAiLab/NCC). You can also skip some of the steps in the guide, and copy the [5-gram model from this repo](https://huggingface.co/NbAiLab/XLSR-300M-bokmaal/tree/main/language_model).
61
 
62
 
63
  ### Parameters
@@ -103,7 +103,7 @@ The final model was run using these parameters:
103
  --preprocessing_num_workers="32"
104
  ```
105
 
106
- Following this settings, the training might take 3-4 days on an average GPU. You should however get a decent model and faster results by tweaking these parameters
107
 
108
  | Parameter| Comment |
109
  |:-------------|:-----|
 
30
  ---
31
 
32
  # Norwegian Wav2Vec2 Model - 300M - VoxRex - Bokmål
33
+ This model is finetuned on top of feature extractor [VoxRex-model](https://huggingface.co/KBLab/wav2vec2-large-voxrex) from the National Library of Sweden. The finetuned model achieves the following results on the test set with a 5-gram KenLM. The numbers in parentheses are the results without the language model:
34
  - **WER: 0.0703** (0.0979)
35
  - **CER: 0.0269** (0.0311)
36
 
 
43
  | NbAiLab/nb-wav2vec2-300m-bokmaal (this model) | 7.03 | |
44
  | [NbAiLab/nb-wav2vec2-300m-nynorsk](https://huggingface.co/NbAiLab/nb-wav2vec2-300m-nynorsk) | 12.22 | |
45
  ## Dataset
46
+ In parallel with the event, the team also converted the [Norwegian Parliamentary Speech Corpus (NPSC)](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-58/) to the [NbAiLab/NPSC](https://huggingface.co/datasets/NbAiLab/NPSC) in 🤗 Dataset format and used that as the main source for training.
47
 
48
  ## Code
49
+ We have released all the code developed during the event so that the Norwegian NLP community can build upon it when developing even better Norwegian ASR models. The finetuning of these models is not very computationally demanding. After following the instructions here, you should be able to train your own automatic speech recognition system in less than a day with an average GPU.
50
 
51
  ## Team
52
  The following people contributed to building this model: Rolv-Arild Braaten, Per Egil Kummervold, Andre Kåsen, Javier de la Rosa, Per Erik Solberg, and Freddy Wetjen.
 
54
  ## Training procedure
55
  To reproduce these results, we strongly recommend that you follow the [instructions from 🤗](https://github.com/huggingface/transformers/tree/master/examples/research_projects/robust-speech-event#talks) to train a simple Swedish model.
56
 
57
+ When you have verified that you are able to do this, create a fresh new repo. You can then start by copying the files ```run.sh``` and ```run_speech_recognition_ctc.py``` from our repo. Running these will create all the other necessary files, and should let you reproduce our results. With some tweaks to the hyperparameters, you might even be able to build an even better ASR. Good luck!
58
 
59
  ### Language Model
60
+ As the scores indicate, adding even a simple 5-gram language will improve the results. 🤗 has provided another [very nice blog](https://huggingface.co/blog/wav2vec2-with-ngram) explaining how to add a 5-gram language model to improve the ASR model. You can build this from your own corpus, for instance by extracting some suitable text from the [Norwegian Colossal Corpus](https://huggingface.co/datasets/NbAiLab/NCC). You can also skip some of the steps in the guide, and copy the [5-gram model from this repo](https://huggingface.co/NbAiLab/XLSR-300M-bokmaal/tree/main/language_model).
61
 
62
 
63
  ### Parameters
 
103
  --preprocessing_num_workers="32"
104
  ```
105
 
106
+ Using these settings, the training might take 3-4 days on an average GPU. You can, however, get a decent model and faster results by tweaking these parameters.
107
 
108
  | Parameter| Comment |
109
  |:-------------|:-----|