File size: 10,547 Bytes
28dfa1c d0c2b23 28dfa1c f709a8a a039286 f95fec8 ee78c5a e938331 a039286 f436e2b aba6083 f576567 f436e2b 19ec4a3 aba6083 f436e2b 28dfa1c d0d7824 dbdef5f e1f5660 b2afa7e 2359491 c033882 4cd5437 b2afa7e 519cc77 b2afa7e 0b6ab67 b2afa7e 1de30ac 519cc77 d0d7824 2359491 9f49a34 61e8420 9f49a34 78f5d60 9f49a34 b2afa7e 2359491 fa96a30 2359491 fa611d6 37b7280 fa611d6 132e81b 0b6ab67 fa611d6 47300a8 37b7280 37b3542 80f34e5 37b3542 6af287e 37b3542 6af287e 37b3542 d0d7824 7472408 fa96a30 7472408 f7fbd24 f205daf f7fbd24 f205daf f7fbd24 c85462e f7fbd24 8256038 f7fbd24 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
pipeline_tag: text-generation
---
# Model Card for Breeze-7B-Instruct-v0.1
Breeze-7B-Instruct-v0.1 is a 7-billion-parameter language model built from Mistral-7B and tailored for Traditional Chinese (TC).
This model expands the TC vocabulary (extra 30k TC tokens) based on the original Mistral-7B to better adapt to TC and improve inference speed,
resulting in a doubling of the original tokenizer's inference speed.
To the best of our knowledge, this is the first work on vocabulary expansion in TC.
This model uses 250GB of TC data for continued pre-training and uses over 1M instances for further supervised fine-tuning.
Breeze-7B-Instruct-v0.1 performs well on both EN and TC benchmarks.
This model outperforms Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat, and Yi-6B-Chat on all TC benchmarks
and is comparable with Mistral-7B-Instruct-v0.1 on MMLU and MT-Bench in English.
*A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*
## Features
- Expanding the vocabulary dictionary for Traditional Chinese from 32k to 62k vocabulary size
- Multi-turn dialogue (without special handling for harmfulness)
- 8k context length
## Model Details
- **Finetuned from:** [MediaTek-Research/Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1)
- **Model type:** Causal decoder-only transformer language model
- **Language:** English and Traditional Chinese (zh-tw)
## Base Model Performance
| Models | | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MMLU (ACC) |
|----------------------------------------------|--------|--------------|-------------|-------------|------------|
| | |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
| | | 5 shot | 3 shot | 5 shot | 5 shot |
| [Yi-34B](https://huggingface.co/01-ai/Yi-34B)| 34B | 63.10 | 84.57 | 49.31 | 77.42 |
| [Qwen-14B](https://huggingface.co/01-ai/Qwen/Qwen-14B)| 14B | 51.30 | 16.95 * | 50.69 | 68.83 |
| [Yi-6B](https://huggingface.co/01-ai/Yi-6B) | 6B | 49.63 | 76.61 | 34.72 | 65.35 |
| [Qwen-7B](https://huggingface.co/01-ai/Qwen/Qwen-7B)| 7B | 42.84 | 0.0 * | 39.58 | 61.00 |
| [**Breeze-7B-Base-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) | 7B | 40.35 | 81.13 | 28.47 | 61.63 |
| [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)| 7B | 36.93 | 79.27 | 27.78 | 64.89 |
\* Few-shot learning cannot effectively guide the model to generate the proper answer.
| Category ACC of TMMLU+ (5 shot) | STEM | Social Science | Humanities | Other |
|-----------------------------------------------------|--------------|----------------|------------|------------|
| Yi-34B | 56.03 | 73.06 | 61.12 | 62.19 |
| Qwen-14B | 46.51 | 58.20 | 51.12 | 49.38 |
| Yi-6B | 41.14 | 57.77 | 50.22 | 49.39 |
| Qwen-7B | 28.25 | 47.80 | 43.14 | 42.17 |
| **Breeze-7B-Base-v0.1** | 35.74 | 46.08 | 40.29 | 39.27 |
| Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 |
## Chat Model Performance
| Models | | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
|--------------------------------------------|--------|--------------|--------------|-----------|-------------|--------|------------|------------|------------------|
| | |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|TC, Chat |EN, Knowledge|EN, Knowledge|EN, Chat |
| | | 0 shot | 5 shot | 3 shot | 0 shot | 0 shot | 0 shot | 5 shot | 0 shot |
| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 34B | 54.87 | | | 36.81 | 6.9 | 71.04 | | 7.6 |
| [Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) | 14B | 48.41 | | | 41.67 | 6.4 | 64.91 | | 7.2 |
| [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) | 6B | 44.79 | | | 25.69 | 5.0 | 59.45 | | 6.0 |
| [gpt-3.5-turbo](https://openai.com) | | 41.76 | | | | 7.1 | 70.00 | | 7.9 |
| [**Breeze-7B-Instruct-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1) | 7B | 41.61 | | | 45.83 | 5.7 | 63.26 | | 7.1 |
| [**Breeze-7B-Instruct-64k-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-64k-v0.1) | 7B | 40.99 | | | 36.11 | 5.5 | 63.68 | | 7.1 |
| [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) | 7B | 40.02 | | | 33.33 | 5.4 | 55.94 | | 6.2 |
| [Taiwan-LLM-13B-v2.0-chat](https://huggingface.co/yentinglin/Taiwan-LLM-13B-v2.0-chat) | 13B | 29.47 | | | 23.61 | 5.0 | 50.50 | | -* |
| [Taiwan-LLM-7B-v2.1-chat](https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat) | 7B | 28.08 | | | 31.25 | 4.2 | 42.72 | | -* |
\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
| Category ACC of TMMLU+ (0 shot) | STEM | Social Science | Humanities | Other |
|-----------------------------------------------------|--------------|----------------|------------|------------|
| Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 |
| Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 |
| Yi-6B-Chat | 37.80 | 51.74 | 45.36 | 44.25 |
| gpt-3.5-turbo | 41.56 | 46.72 | 36.73 | 42.03 |
| **Breeze-7B-Instruct-v0.1** | 37.41 | 46.81 | 42.06 | 40.16 |
| **Breeze-7B-Instruct-64k-v0.1** | 37.88 | 46.35 | 40.31 | 39.40 |
| Qwen-7B-Chat | 35.44 | 46.22 | 38.35 | 40.06 |
| Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 |
| Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 |
## Inference Performance
In this test, we use the first 700 characters a [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as input and ask the model to rewrite the article.
All models were inferenced with `vllm` on 2 A6000 (TP=2 ).
| Models | Inference Time (sec)|Estimated Max Input Length (TC Char)|
|--------------------------------------------------------------------|-------------------|--------------------------|
| Yi-6B | 10.62 | 5.2k |
| **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k |
| **Breeze-7B-Instruct-64k-v0.1** | 10.74 | 88.8k |
| Qwen-7B | 10.86 | 9.8k |
| Qwen-14B | 18.89 | 9.8k |
| Mistral-7B-v0.1 | 20.48 | 5.1k |
| Taiwan-LLM-7B-v2.1-base | 26.26 | 2.2k |
| Taiwan-LLM-13B-v2.0-base | 36.8 | 2.2k |
| Yi-34B | 43.71 | 4.5k |
## Examples
## Use in Transformers
First install direct dependencies:
```
pip install transformers torch accelerate
```
If you want faster inference using flash-attention2, you need to install these dependencies:
```bash
pip install packaging ninja
pip install flash-attn
```
Then load the model in transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
device_map="auto",
torch_dtype=torch.bfloat16,
use_flash_attn_2=True # optional
)
```
The structure of the query template follows that of Mistral-7B-Instruct, as shown below.
```txt
<s> SYS_PROMPT [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST]
```
where `SYS_PROMPT`, `QUERY1`, `RESPONSE1`, and `QUERY2` can be provided by the user.
The suggested default `SYS_PROMPT` is
```txt
You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
```
|