MediaTek-Research
/

Breeze-7B-Instruct-v0_1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

cllatMTK commited on Jan 11

Commit

6af287e

•

1 Parent(s): 78f5d60

Update README.md

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -88,17 +88,17 @@ and is comparable with Mistral-7B-Instruct-v0.1 on MMLU and MT-Bench in English.
 In this test, we use the first 700 characters a [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as input and ask the model to rewrite the article.
 All models were inferenced with `vllm` on 2 A6000 (TP=2 ).
-| Models                                                             | Speed in char/sec (output token #) |Estimated Max Input Length (TC Char)|
 |--------------------------------------------------------------------|-------------------|--------------------------|
-| Yi-6B                                                        |   65.91 (631 token)  |    4.4k                  |
-| **Breeze-7B-Instruct-v0.1**                              |   65.17 (518 token)  |    10.1k                 |
-| **Breeze-7B-Instruct-64k-v0.1**                              | 65.17 (518 token)       |  80.8k            |
-| Qwen-7B                                                       |   64.45 (582 token)          |    9.7k                  |
-| Qwen-14B                                                      |   37.05 (582 token)  |    9.7k                  |
-| Mistral-7B-v0.1                                          |   34.17 (1110 token)   |    6.3k                 |
-| Taiwan-LLM-7B-v2.1-base                                 |   26.65 (1302 token)          |    2.6k                  |
-| Taiwan-LLM-13B-v2.0-base                                |   19.02 (1302 token)          |    2.6k                  |
-| Yi-34B                                                       |   16.01  (631 token)   |    4.4k                  |
 ## Examples

 In this test, we use the first 700 characters a [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as input and ask the model to rewrite the article.
 All models were inferenced with `vllm` on 2 A6000 (TP=2 ).
+| Models                                                             | Inference Time (sec)|Estimated Max Input Length (TC Char)|
 |--------------------------------------------------------------------|-------------------|--------------------------|
+| Yi-6B                                                        |   10.62  |   5.2k                |
+| **Breeze-7B-Instruct-v0.1**                              |  10.74  |    11.1k                 |
+| **Breeze-7B-Instruct-64k-v0.1**                              | 10.74       |  88.8k            |
+| Qwen-7B                                                       |   10.86         |    9.8k                  |
+| Qwen-14B                                                      |   18.89  |    9.8k                  |
+| Mistral-7B-v0.1                                          |  20.48   |    5.1k                 |
+| Taiwan-LLM-7B-v2.1-base                                 |   26.26          |    2.2k                  |
+| Taiwan-LLM-13B-v2.0-base                                |   36.8          |    2.2k                  |
+| Yi-34B                                                       |  43.71   |    4.5k                  |
 ## Examples