Titouan
commited on
Commit
•
948de53
1
Parent(s):
4c29221
update readme
Browse files
README.md
CHANGED
@@ -38,23 +38,18 @@ Pretrained wav2vec2 models are distributed under the apache-2.0 licence. Hence,
|
|
38 |
As our wav2vec2 models were trained with Fairseq, then can be used in the different tools that they provide to fine-tune the model for ASR with CTC. The full procedure has been nicely summarized in [this blogpost](https://huggingface.co/blog/fine-tune-wav2vec2-english).
|
39 |
|
40 |
Please note that due to the nature of CTC, speech-to-text results aren't expected to be state-of-the-art. Moreover, future features might appear depending on the involvement of Fairseq and HuggingFace on this part.
|
41 |
-
|
42 |
|
43 |
-
|
44 |
## Integrate to SpeechBrain for ASR, Speaker, Source Separation ...
|
45 |
|
46 |
-
|
47 |
-
|
48 |
Pretrained wav2vec models recently gained in popularity. At the same time [SpeechBrain toolkit](https://speechbrain.github.io) came out, proposing a new and simpler way of dealing with state-of-the-art speech & deep-learning technologies.
|
49 |
|
50 |
While it currently is in beta, SpeechBrain offers two different ways of nicely integrating wav2vec2 models that were trained with Fairseq i.e our LeBenchmark models!
|
51 |
|
52 |
-
**Work In Progress**
|
53 |
-
|
54 |
-
The integration of wav2vec2 models is currently under [Pull Request](https://github.com/speechbrain/speechbrain/pull/576). However, the feature can already be used to:
|
55 |
1. Extract wav2vec2 features on-the-fly (with a frozen wav2vec2 encoder) to be combined with any speech related architecture. Examples are: E2E ASR with CTC+Att+Language Models; Speaker Recognition or Verification, Source Separation ...
|
56 |
2. *Experimental:* To fully benefit from wav2vec2, the best solution remains to fine-tune the model while you train your downstream task. This is very simply allowed within SpeechBrain as just a flag needs to be turned on. Thus, our wav2vec2 models can be fine-tuned while training your favorite ASR pipeline or Speaker Recognizer.
|
57 |
|
|
|
|
|
58 |
## Referencing LeBenchmark
|
59 |
|
60 |
```
|
|
|
38 |
As our wav2vec2 models were trained with Fairseq, then can be used in the different tools that they provide to fine-tune the model for ASR with CTC. The full procedure has been nicely summarized in [this blogpost](https://huggingface.co/blog/fine-tune-wav2vec2-english).
|
39 |
|
40 |
Please note that due to the nature of CTC, speech-to-text results aren't expected to be state-of-the-art. Moreover, future features might appear depending on the involvement of Fairseq and HuggingFace on this part.
|
|
|
41 |
|
|
|
42 |
## Integrate to SpeechBrain for ASR, Speaker, Source Separation ...
|
43 |
|
|
|
|
|
44 |
Pretrained wav2vec models recently gained in popularity. At the same time [SpeechBrain toolkit](https://speechbrain.github.io) came out, proposing a new and simpler way of dealing with state-of-the-art speech & deep-learning technologies.
|
45 |
|
46 |
While it currently is in beta, SpeechBrain offers two different ways of nicely integrating wav2vec2 models that were trained with Fairseq i.e our LeBenchmark models!
|
47 |
|
|
|
|
|
|
|
48 |
1. Extract wav2vec2 features on-the-fly (with a frozen wav2vec2 encoder) to be combined with any speech related architecture. Examples are: E2E ASR with CTC+Att+Language Models; Speaker Recognition or Verification, Source Separation ...
|
49 |
2. *Experimental:* To fully benefit from wav2vec2, the best solution remains to fine-tune the model while you train your downstream task. This is very simply allowed within SpeechBrain as just a flag needs to be turned on. Thus, our wav2vec2 models can be fine-tuned while training your favorite ASR pipeline or Speaker Recognizer.
|
50 |
|
51 |
+
**If interested, simply follow this [tutorial](https://colab.research.google.com/drive/17Hu1pxqhfMisjkSgmM2CnZxfqDyn2hSY?usp=sharing)**
|
52 |
+
|
53 |
## Referencing LeBenchmark
|
54 |
|
55 |
```
|