Update README.md
Browse files
README.md
CHANGED
@@ -23,8 +23,39 @@ This model outperforms Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat, and Yi
|
|
23 |
- **Model type:** Causal decoder-only transformer language model
|
24 |
- **Language:** English and Traditional Chinese (zh-tw)
|
25 |
|
26 |
-
## Prompt Template
|
27 |
-
|
28 |
-
|
29 |
## Performance
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
- **Model type:** Causal decoder-only transformer language model
|
24 |
- **Language:** English and Traditional Chinese (zh-tw)
|
25 |
|
|
|
|
|
|
|
26 |
## Performance
|
27 |
|
28 |
+
|
29 |
+
## Use in Transformers
|
30 |
+
|
31 |
+
First install direct dependencies:
|
32 |
+
```
|
33 |
+
pip install transformers torch accelerate
|
34 |
+
```
|
35 |
+
If you want faster inference using flash-attention2, you need to install these dependencies:
|
36 |
+
```bash
|
37 |
+
pip install packaging ninja
|
38 |
+
pip install flash-attn
|
39 |
+
```
|
40 |
+
Then load the model in transformers:
|
41 |
+
```python
|
42 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
43 |
+
import torch
|
44 |
+
|
45 |
+
model = AutoModelForCausalLM.from_pretrained(
|
46 |
+
model="MediaTek-Research/Breeze-7B-Instruct-v0.1",
|
47 |
+
device_map="auto",
|
48 |
+
torch_dtype=torch.bfloat16,
|
49 |
+
use_flash_attn_2=True # optional
|
50 |
+
)
|
51 |
+
```
|
52 |
+
|
53 |
+
The structure of the prompt template follows that of Mistral-7B-Instruct, as shown below.
|
54 |
+
```txt
|
55 |
+
<s> SYS_PROMPT [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST] RESPONSE2</s>
|
56 |
+
```
|
57 |
+
|
58 |
+
The suggested default `SYS_PROMPT` is
|
59 |
+
```txt
|
60 |
+
You are a helpful AI assistant bulit by MediaTek Research. The user you helped speaks Traditional Chinese and comes from Taiwan.
|
61 |
+
```
|