pipeline_tag: text-generation
Model Card for Breeze-7B-Instruct-v0.1
Breeze-7B-Instruct-v0.1 is a 7-billion-parameter language model built from Mistral-7B and tailored for Traditional Chinese (TC). This model incorporates additional 30k TC tokens in vocabulary dictionary to better adapt to TC and improve inference speed, resulting in a doubling of the original tokenizer's inference speed. Breeze-7B-Instruct-v0.1 performs well on both EN and TC benchmarks. This model outperforms Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat, and Yi-6B-Chat on major TC benchmarks we tested, and is comparable with Mistral-7B-Instruct-v0.1 on MMLU and MT-Bench in English.
A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Po-Chun Hsu 許博竣, Feng-Ting Liao 廖峰挺, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.
Features
- Expanding the vocabulary dictionary for Traditional Chinese from 32k to 62k vocabulary size (the first successful work in Traditional Chinese)
- Multi-turn dialogue without special handling for harmfulness
- 8k context length
- Grouped-query and sliding-window attention
Model Details
- Finetuned from: MediaTek-Research/Breeze-7B-Base-v0.1
- Model type: Causal decoder-only transformer language model
- Language: English and Traditional Chinese (zh-tw)
Performance
Traditional Chinese Benchmarks: | TMMLU+ (ACC) | DRCD (EM) | MT-Bench-tw (Score) |
---|---|---|---|
Breeze-7B-Base-v0.1 | |||
Breeze-7B-Instruct-v0.1 | |||
mistralai/Mistral-7B-v0.1 | |||
mistralai/Mistral-7B-Instruct-v0.1 | |||
yentinglin/Taiwan-LLM-7B-v2.1-base | |||
yentinglin/Taiwan-LLM-7B-v2.1-chat | |||
yentinglin/Taiwan-LLM-13B-v2.0-base | |||
yentinglin/Taiwan-LLM-13B-v2.0-chat | |||
01-ai/Yi-6B-Base | |||
01-ai/Yi-6B-Chat | |||
01-ai/Yi-34B-Base | |||
01-ai/Yi-34B-Chat | |||
Qwen/Qwen-7B | |||
Qwen/Qwen-7B-Chat | |||
Qwen/Qwen-14B | |||
Qwen/Qwen-14B-Chat | |||
gpt-3.5-turbo-0613 |
English Benchmarks: | MMLU (ACC) | MT-Bench (Score) |
---|---|---|
Breeze-7B-Base-v0.1 | ||
Breeze-7B-Instruct-v0.1 | ||
mistralai/Mistral-7B-v0.1 | ||
mistralai/Mistral-7B-Instruct-v0.1 | ||
yentinglin/Taiwan-LLM-7B-v2.1-base | ||
yentinglin/Taiwan-LLM-7B-v2.1-chat | ||
yentinglin/Taiwan-LLM-13B-v2.0-base | ||
yentinglin/Taiwan-LLM-13B-v2.0-chat | ||
01-ai/Yi-6B-Base | ||
01-ai/Yi-6B-Chat | ||
01-ai/Yi-34B-Base | ||
01-ai/Yi-34B-Chat | ||
Qwen/Qwen-7B | ||
Qwen/Qwen-7B-Chat | ||
Qwen/Qwen-14B | ||
Qwen/Qwen-14B-Chat | ||
gpt-3.5-turbo-0613 |
Inference Speed Test: | Speed (char/sec) |
---|---|
Breeze-7B-Instruct-v0.1 | |
mistralai/Mistral-7B-Instruct-v0.1 | |
yentinglin/Taiwan-LLM-7B-v2.1-chat | |
yentinglin/Taiwan-LLM-13B-v2.0-chat | |
01-ai/Yi-6B-Chat | |
01-ai/Yi-34B-Chat | |
Qwen/Qwen-7B-Chat | |
Qwen/Qwen-14B-Chat |
Use in Transformers
First install direct dependencies:
pip install transformers torch accelerate
If you want faster inference using flash-attention2, you need to install these dependencies:
pip install packaging ninja
pip install flash-attn
Then load the model in transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
device_map="auto",
torch_dtype=torch.bfloat16,
use_flash_attn_2=True # optional
)
The structure of the query template follows that of Mistral-7B-Instruct, as shown below.
<s> SYS_PROMPT [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST]
where SYS_PROMPT
, QUERY1
, RESPONSE1
, and QUERY2
can be provided by the user.
The suggested default SYS_PROMPT
is
You are a helpful AI assistant bulit by MediaTek Research. The user you helped speaks Traditional Chinese and comes from Taiwan.