metadata

pipeline_tag: text-generation

Model Card for Breeze-7B-Instruct-v0.1

Breeze-7B-Instruct-v0.1 is a 7-billion-parameter language model built from Mistral-7B and tailored for Traditional Chinese (TC). This model incorporates additional 30k TC tokens in vocabulary dictionary to better adapt to TC and improve inference speed, resulting in a doubling of the original tokenizer's inference speed. Breeze-7B-Instruct-v0.1 performs well on both EN and TC benchmarks. This model outperforms Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat, and Yi-6B-Chat on major TC benchmarks we tested, and is comparable with Mistral-7B-Instruct-v0.1 on MMLU and MT-Bench in English.

A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Po-Chun Hsu 許博竣, Feng-Ting Liao 廖峰挺, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.

Features

Expanding the vocabulary dictionary for Traditional Chinese from 32k to 62k vocabulary size (the first successful work in Traditional Chinese)
Multi-turn dialogue without special handling for harmfulness
8k context length
Grouped-query and sliding-window attention

Model Details

Finetuned from: MediaTek-Research/Breeze-7B-Base-v0.1
Model type: Causal decoder-only transformer language model
Language: English and Traditional Chinese (zh-tw)

Performance

Traditional Chinese Benchmarks:	TMMLU+ (ACC)	DRCD (EM)	MT-Bench-tw (Score)
Breeze-7B-Base-v0.1
Breeze-7B-Instruct-v0.1
mistralai/Mistral-7B-v0.1
mistralai/Mistral-7B-Instruct-v0.1
yentinglin/Taiwan-LLM-7B-v2.1-base
yentinglin/Taiwan-LLM-7B-v2.1-chat
yentinglin/Taiwan-LLM-13B-v2.0-base
yentinglin/Taiwan-LLM-13B-v2.0-chat
01-ai/Yi-6B-Base
01-ai/Yi-6B-Chat
01-ai/Yi-34B-Base
01-ai/Yi-34B-Chat
Qwen/Qwen-7B
Qwen/Qwen-7B-Chat
Qwen/Qwen-14B
Qwen/Qwen-14B-Chat
gpt-3.5-turbo-0613

English Benchmarks:	MMLU (ACC)	MT-Bench (Score)
Breeze-7B-Base-v0.1
Breeze-7B-Instruct-v0.1
mistralai/Mistral-7B-v0.1
mistralai/Mistral-7B-Instruct-v0.1
yentinglin/Taiwan-LLM-7B-v2.1-base
yentinglin/Taiwan-LLM-7B-v2.1-chat
yentinglin/Taiwan-LLM-13B-v2.0-base
yentinglin/Taiwan-LLM-13B-v2.0-chat
01-ai/Yi-6B-Base
01-ai/Yi-6B-Chat
01-ai/Yi-34B-Base
01-ai/Yi-34B-Chat
Qwen/Qwen-7B
Qwen/Qwen-7B-Chat
Qwen/Qwen-14B
Qwen/Qwen-14B-Chat
gpt-3.5-turbo-0613

Inference Speed Test:	Speed (char/sec)
Breeze-7B-Instruct-v0.1
mistralai/Mistral-7B-Instruct-v0.1
yentinglin/Taiwan-LLM-7B-v2.1-chat
yentinglin/Taiwan-LLM-13B-v2.0-chat
01-ai/Yi-6B-Chat
01-ai/Yi-34B-Chat
Qwen/Qwen-7B-Chat
Qwen/Qwen-14B-Chat

Use in Transformers

First install direct dependencies:

pip install transformers torch accelerate

If you want faster inference using flash-attention2, you need to install these dependencies:

pip install packaging ninja
pip install flash-attn

Then load the model in transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    use_flash_attn_2=True # optional
)

The structure of the query template follows that of Mistral-7B-Instruct, as shown below.

<s> SYS_PROMPT   [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST]

where SYS_PROMPT, QUERY1, RESPONSE1, and QUERY2 can be provided by the user.

The suggested default SYS_PROMPT is

You are a helpful AI assistant bulit by MediaTek Research. The user you helped speaks Traditional Chinese and comes from Taiwan.