asahi417 commited on
Commit
40aca74
1 Parent(s): 84ca102

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -36,6 +36,24 @@ The folloiwng table shows WER computed over the reference and predicted translat
36
 
37
  See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-tech/kotoba-whisper) for the evaluation detail.
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ## Usage
40
  Here is an example to translate Japanese speech into English text translation.
41
  First, download a sample speech.
 
36
 
37
  See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-tech/kotoba-whisper) for the evaluation detail.
38
 
39
+ ### Inference Speed
40
+ Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
41
+ Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
42
+
43
+ | model | 10 | 30 | 60 |
44
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|
45
+ | [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)) | 0.173 | 0.247 | 0.352 |
46
+ | [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)) | 0.173 | 0.24 | 0.348 |
47
+ | [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17 | 0.245 | 0.348 |
48
+ | [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 |
49
+ | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 0.061 | 0.184 | 0.372 |
50
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 0.062 | 0.199 | 0.415 |
51
+ | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 0.062 | 0.183 | 0.363 |
52
+ | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 0.045 | 0.132 | 0.266 |
53
+ | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 0.135 | 0.376 | 0.631 |
54
+ | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 0.054 | 0.108 | 0.231 |
55
+ | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 0.045 | 0.124 | 0.208 |
56
+
57
  ## Usage
58
  Here is an example to translate Japanese speech into English text translation.
59
  First, download a sample speech.