File size: 10,547 Bytes
28dfa1c
 
 
 
d0c2b23
28dfa1c
f709a8a
a039286
 
 
f95fec8
ee78c5a
e938331
a039286
f436e2b
aba6083
f576567
f436e2b
 
19ec4a3
aba6083
f436e2b
28dfa1c
 
 
 
 
 
d0d7824
dbdef5f
e1f5660
 
 
 
 
 
 
 
 
 
b2afa7e
2359491
c033882
4cd5437
b2afa7e
519cc77
b2afa7e
 
 
 
0b6ab67
b2afa7e
1de30ac
519cc77
d0d7824
2359491
9f49a34
 
 
 
61e8420
9f49a34
 
78f5d60
9f49a34
 
 
 
 
b2afa7e
2359491
fa96a30
2359491
fa611d6
37b7280
fa611d6
 
 
132e81b
0b6ab67
 
fa611d6
 
 
47300a8
37b7280
37b3542
80f34e5
 
37b3542
6af287e
37b3542
6af287e
 
 
 
 
 
 
 
 
37b3542
d0d7824
7472408
fa96a30
7472408
f7fbd24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f205daf
f7fbd24
f205daf
f7fbd24
c85462e
f7fbd24
 
 
8256038
f7fbd24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
pipeline_tag: text-generation
---

# Model Card for Breeze-7B-Instruct-v0.1

Breeze-7B-Instruct-v0.1 is a 7-billion-parameter language model built from Mistral-7B and tailored for Traditional Chinese (TC).
This model expands the TC vocabulary (extra 30k TC tokens) based on the original Mistral-7B to better adapt to TC and improve inference speed, 
resulting in a doubling of the original tokenizer's inference speed.
To the best of our knowledge, this is the first work on vocabulary expansion in TC. 
This model uses 250GB of TC data for continued pre-training and uses over 1M instances for further supervised fine-tuning. 
Breeze-7B-Instruct-v0.1 performs well on both EN and TC benchmarks. 
This model outperforms Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat, and Yi-6B-Chat on all TC benchmarks 
and is comparable with Mistral-7B-Instruct-v0.1 on MMLU and MT-Bench in English.

*A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*

## Features

- Expanding the vocabulary dictionary for Traditional Chinese from 32k to 62k vocabulary size 
- Multi-turn dialogue (without special handling for harmfulness)
- 8k context length

## Model Details
- **Finetuned from:** [MediaTek-Research/Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1)
- **Model type:** Causal decoder-only transformer language model
- **Language:** English and Traditional Chinese (zh-tw)

## Base Model Performance

| Models                                       |        | TMMLU+ (ACC) | DRCD (EM)   | Table (ACC) | MMLU (ACC) |
|----------------------------------------------|--------|--------------|-------------|-------------|------------|
|                                              |        |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge|
|                                              |        | 5 shot       | 3 shot      | 5 shot      | 5 shot     |
| [Yi-34B](https://huggingface.co/01-ai/Yi-34B)| 34B    | 63.10        | 84.57       | 49.31  | 77.42      |
| [Qwen-14B](https://huggingface.co/01-ai/Qwen/Qwen-14B)| 14B    | 51.30        | 16.95 *     | 50.69  | 68.83      |
| [Yi-6B](https://huggingface.co/01-ai/Yi-6B) | 6B     | 49.63        | 76.61       | 34.72  | 65.35      |
| [Qwen-7B](https://huggingface.co/01-ai/Qwen/Qwen-7B)| 7B     | 42.84        | 0.0 *       | 39.58  | 61.00      |
| [**Breeze-7B-Base-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1)       | 7B     | 40.35        | 81.13        | 28.47  | 61.63      |
| [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)| 7B     | 36.93        | 79.27        | 27.78 | 64.89      |


\* Few-shot learning cannot effectively guide the model to generate the proper answer.

| Category ACC of TMMLU+ (5 shot)                     | STEM         | Social Science | Humanities | Other      |
|-----------------------------------------------------|--------------|----------------|------------|------------|
| Yi-34B                                        | 56.03        | 73.06          | 61.12      | 62.19      |
| Qwen-14B                                       | 46.51        | 58.20          | 51.12      | 49.38      |
| Yi-6B                                         | 41.14        | 57.77          | 50.22      | 49.39      |
| Qwen-7B                                        | 28.25        | 47.80          | 43.14      | 42.17      |
| **Breeze-7B-Base-v0.1**               | 35.74        | 46.08          | 40.29      | 39.27      |
| Mistral-7B-v0.1                           | 33.01        | 42.23          | 35.86      | 37.63      |


## Chat Model Performance

| Models                                     |        | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) |
|--------------------------------------------|--------|--------------|--------------|-----------|-------------|--------|------------|------------|------------------|
|                                                                                                         |        |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|TC, Chat           |EN, Knowledge|EN, Knowledge|EN, Chat        |
|                                                                                                         |        | 0 shot       | 5 shot       | 3 shot    | 0 shot | 0 shot              | 0 shot     | 5 shot    | 0 shot           |
| [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)                                                 | 34B    | 54.87        |              |           | 36.81 |   6.9             | 71.04      |           |    7.6            |
| [Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat)                                              | 14B    | 48.41        |              |           | 41.67 |   6.4             | 64.91      |           |    7.2            |
| [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)                                                   | 6B     | 44.79        |              |           | 25.69 |   5.0             | 59.45      |           |    6.0            |
| [gpt-3.5-turbo](https://openai.com)                                                                                     |        | 41.76        |              |           |  |    7.1             |   70.00      |           |    7.9            |
| [**Breeze-7B-Instruct-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1)         | 7B     | 41.61        |              |           | 45.83  |   5.7             | 63.26      |           |    7.1            |
| [**Breeze-7B-Instruct-64k-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-64k-v0.1) | 7B     | 40.99        |              |           | 36.11 |   5.5             | 63.68      |           |    7.1            |
| [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)                                                | 7B     | 40.02        |              |           | 33.33 |   5.4             | 55.94      |           |    6.2            |
| [Taiwan-LLM-13B-v2.0-chat](https://huggingface.co/yentinglin/Taiwan-LLM-13B-v2.0-chat)                  | 13B    | 29.47        |              |           | 23.61 |   5.0             | 50.50      |           |     -*            |
| [Taiwan-LLM-7B-v2.1-chat](https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat)                    | 7B     | 28.08        |              |           | 31.25 |   4.2             | 42.72      |           |     -*            |


\* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.

| Category ACC of TMMLU+ (0 shot)                     | STEM         | Social Science | Humanities | Other      |
|-----------------------------------------------------|--------------|----------------|------------|------------|
| Yi-34B-Chat                                         | 47.65        | 64.25          | 52.73      | 54.91      |
| Qwen-14B-Chat                                       | 43.83        | 55.00          | 48.55      | 46.22      |
| Yi-6B-Chat                                          | 37.80        | 51.74          | 45.36      | 44.25      |
| gpt-3.5-turbo                                       | 41.56        | 46.72          | 36.73      | 42.03      |
| **Breeze-7B-Instruct-v0.1**                             | 37.41        | 46.81          | 42.06      | 40.16      |
| **Breeze-7B-Instruct-64k-v0.1**                         | 37.88        | 46.35          | 40.31      | 39.40      |
| Qwen-7B-Chat                                        | 35.44        | 46.22          | 38.35      | 40.06      |
| Taiwan-LLM-13B-v2.0-chat                            | 27.74        | 33.69          | 27.03      | 29.43      |
| Taiwan-LLM-7B-v2.1-chat                             | 25.58        | 31.76          | 27.36      | 27.61      |


## Inference Performance
In this test, we use the first 700 characters a [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as input and ask the model to rewrite the article.
All models were inferenced with `vllm` on 2 A6000 (TP=2 ).

| Models                                                             | Inference Time (sec)|Estimated Max Input Length (TC Char)|
|--------------------------------------------------------------------|-------------------|--------------------------|
| Yi-6B                                                        |   10.62  |   5.2k                |
| **Breeze-7B-Instruct-v0.1**                              |  10.74  |    11.1k                 |
| **Breeze-7B-Instruct-64k-v0.1**                              | 10.74       |  88.8k            |
| Qwen-7B                                                       |   10.86         |    9.8k                  |
| Qwen-14B                                                      |   18.89  |    9.8k                  |
| Mistral-7B-v0.1                                          |  20.48   |    5.1k                 |
| Taiwan-LLM-7B-v2.1-base                                 |   26.26          |    2.2k                  |
| Taiwan-LLM-13B-v2.0-base                                |   36.8          |    2.2k                  |
| Yi-34B                                                       |  43.71   |    4.5k                  |

## Examples



## Use in Transformers

First install direct dependencies:
```
pip install transformers torch accelerate
```
If you want faster inference using flash-attention2, you need to install these dependencies:
```bash
pip install packaging ninja
pip install flash-attn
```
Then load the model in transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    use_flash_attn_2=True # optional
)
```

The structure of the query template follows that of Mistral-7B-Instruct, as shown below.
```txt
<s> SYS_PROMPT   [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST]
```
where `SYS_PROMPT`, `QUERY1`, `RESPONSE1`, and `QUERY2` can be provided by the user.

The suggested default `SYS_PROMPT` is 
```txt
You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.
```