Update README.md
Browse files
README.md
CHANGED
@@ -36,6 +36,24 @@ The folloiwng table shows WER computed over the reference and predicted translat
|
|
36 |
|
37 |
See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-tech/kotoba-whisper) for the evaluation detail.
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
## Usage
|
40 |
Here is an example to translate Japanese speech into English text translation.
|
41 |
First, download a sample speech.
|
|
|
36 |
|
37 |
See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-tech/kotoba-whisper) for the evaluation detail.
|
38 |
|
39 |
+
### Inference Speed
|
40 |
+
Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
|
41 |
+
Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
|
42 |
+
|
43 |
+
| model | 10 | 30 | 60 |
|
44 |
+
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|
|
45 |
+
| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | 0.173 | 0.247 | 0.352 |
|
46 |
+
| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | 0.173 | 0.24 | 0.348 |
|
47 |
+
| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17 | 0.245 | 0.348 |
|
48 |
+
| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 |
|
49 |
+
| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 0.061 | 0.184 | 0.372 |
|
50 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 0.062 | 0.199 | 0.415 |
|
51 |
+
| [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 0.062 | 0.183 | 0.363 |
|
52 |
+
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 0.045 | 0.132 | 0.266 |
|
53 |
+
| [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 0.135 | 0.376 | 0.631 |
|
54 |
+
| [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 0.054 | 0.108 | 0.231 |
|
55 |
+
| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 0.045 | 0.124 | 0.208 |
|
56 |
+
|
57 |
## Usage
|
58 |
Here is an example to translate Japanese speech into English text translation.
|
59 |
First, download a sample speech.
|