File size: 5,753 Bytes
1d56ab5
f4231fe
 
 
 
 
 
1d56ab5
f4231fe
 
 
 
 
 
 
 
 
f96d90b
f4231fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
language:
- zh
- en
pipeline_tag: text-generation
inference: false

---

# Baichuan-7B-Instruction

![](./alpachino.png)

<!-- Provide a quick summary of what the model is/does. -->

## 介绍

Baichuan-7B-Instruction 为 Baichuan-7B 系列模型进行指令微调后的版本,预训练模型可见 [Baichuan-7B-Base](https://huggingface.co/baichuan-inc/Baichuan-7B)。


## Demo

如下是一个使用 gradio 的模型 demo

```python
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True ).half()
model.cuda()

def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
    prompt = ""
    for history in histories:
        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
        prompt += history_with_identity
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(
                    input_ids = input_ids,
                    max_new_tokens=max_new_tokens,
                    early_stopping=True,
                    do_sample=do_sample,
                    top_p=top_p, 
                    temperature=temperature,
                    repetition_penalty=repetition_penalty,
        )
    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    generate_text = rets[0].replace(prompt, "")
    return generate_text
    
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("clear")

    def user(user_message, history):
        return "", history + [[user_message, ""]]

    def bot(history):
        print(history)
        bot_message = generate(history)
        history[-1][1] = bot_message
        return history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0")



```

## 量化部署

Baichuan-7B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。请注意,如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):

```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 
```

同样的,如需使用 int4 量化 (Similarly, to use int4 quantization):

```python
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()
```

## 训练详情

数据集:https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k。

硬件:8*A40

## 测评结果

## [CMMLU](https://github.com/haonan-li/CMMLU)

| Model zero-shot                                              |   STEM    | Humanities | Social Sciences |  Others   | China Specific |  Average  |
| ------------------------------------------------------------ | :-------: | :--------: | :-------------: | :-------: | :------------: | :-------: |
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b)      |   41.28   |   52.85    |      53.37      |   52.24   |     50.58      |   49.95   |
| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B)   |   32.79   |   44.43    |      46.78      |   44.79   |     43.11      |   42.33   |
| [ChatGLM-6B](https://github.com/THUDM/GLM-130B)              |   32.22   |   42.91    |      44.81      |   42.60   |     41.93      |   40.79   |
| [BatGPT-15B](https://arxiv.org/abs/2307.00360)               |   33.72   |   36.53    |      38.07      |   46.94   |     38.32      |   38.51   |
| [Chinese-LLaMA-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) |   26.76   |   26.57    |      27.42      |   28.33   |     26.73      |   27.34   |
| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS)            |   25.68   |   26.35    |      27.21      |   27.92   |     26.70      |   26.88   |
| [Chinese-GLM-10B](https://github.com/THUDM/GLM)              |   25.57   |   25.01    |      26.33      |   25.94   |     25.81      |   25.80   |
| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-7B)  |   42.04   |   60.49    |      59.55      |   56.60   |     55.72      |   54.63   |
| [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-7B) |   37.32   |   56.24    |      54.79      |   54.07   |     52.23      |   50.48   |
| **Baichuan-13B-Instruction**                                 | **42.56** | **62.09**  |    **60.41**    | **58.97** |   **56.95**    | **55.88** |
| **Baichuan-7B-Instruction**                                  | **33.94** | **46.31**  |    **47.73**    | **45.84** |   **44.88**    | **43.53** |

> 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.