japanese-asr
/

ja-cascaded-s2t-translation

@@ -36,6 +36,24 @@ The folloiwng table shows WER computed over the reference and predicted translat
 See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-tech/kotoba-whisper) for the evaluation detail.
 ## Usage
 Here is an example to translate Japanese speech into English text translation.
 First, download a sample speech.

 See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-tech/kotoba-whisper) for the evaluation detail.
+### Inference Speed
+Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
+Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
+| model                                                                                                                                                                                                     |    10 |    30 |    60 |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     | 0.173 | 0.247 | 0.352 |
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B))                     | 0.173 | 0.24  | 0.348 |
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17  | 0.245 | 0.348 |
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 |
+| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                                                                                 | 0.061 | 0.184 | 0.372 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                                                                                 | 0.062 | 0.199 | 0.415 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                                                                                       | 0.062 | 0.183 | 0.363 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                                                                                     | 0.045 | 0.132 | 0.266 |
+| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                                                                                       | 0.135 | 0.376 | 0.631 |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                                                                         | 0.054 | 0.108 | 0.231 |
+| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                                                                         | 0.045 | 0.124 | 0.208 |
 ## Usage
 Here is an example to translate Japanese speech into English text translation.
 First, download a sample speech.