Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ 这是基于Auto-GPTQ框架的量化模型,模型选取为huatuoGPT2-7B,这是一个微调模型,基底模型为百川-7B。
5
+
6
+ 参数说明: 原模型大小:16GB,量化后模型大小:5GB
7
+
8
+ 推理准确度尚未测试,请谨慎使用
9
+
10
+ 量化过程中,校准数据采用微调训练集Medical Fine-tuning Instruction (GPT-4)。
11
+
12
+ 使用示例(目前仅支持awq,transformers的集成尚在研究):
13
+
14
+ 开始之前务必指定GPU
15
+
16
+ ```
17
+ import os
18
+ os.environ["CUDA_VISIBLE_DEVICES"] = "0"
19
+ ```
20
+ 确保你安装了auto-awq
21
+ ```
22
+ !git clone https://github.com/casper-hansen/AutoAWQ
23
+ cd AutoAWQ
24
+ !pip install -e .
25
+ ```
26
+ ```
27
+ from awq import AutoAWQForCausalLM
28
+ from awq.utils.utils import get_best_device
29
+ from transformers import AutoTokenizer, TextStreamer
30
+
31
+
32
+ quant_path = "jiangchengchengNLP/huatuo_AutoAWQ_7B4bits"
33
+
34
+ # Load model
35
+ model = AutoAWQForCausalLM.from_quantized(quant_path,device="cuda",fuse_layers=False)
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
38
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
39
+
40
+ prompt = "You're standing on the surface of the Earth. "\
41
+ "You walk one mile south, one mile west and one mile north. "\
42
+ "You end up exactly where you started. Where are you?"
43
+
44
+ chat = [
45
+ {"role": "user", "content": prompt},
46
+ ]
47
+
48
+ terminators = [
49
+ tokenizer.eos_token_id,
50
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
51
+ ]
52
+ tokenizer.chat_template="""
53
+ {%- for message in messages -%}
54
+ {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
55
+ {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
56
+ {%- endif -%}
57
+
58
+ {%- if message['role'] == 'user' -%}
59
+ {{ '<问>:' + message['content'] + '\n' }}
60
+
61
+ {%- elif message['role'] == 'assistant' -%}
62
+ {{ '<答>:' + message['content'] + '\n' }}
63
+ {%- endif -%}
64
+ {%- endfor -%}
65
+ {%- if add_generation_prompt -%}
66
+ {{- '<答>:' -}}
67
+ {% endif %}
68
+
69
+ """
70
+ tokens = tokenizer.apply_chat_template(
71
+ chat,
72
+ return_tensors="pt"
73
+ )
74
+
75
+ tokens = tokens.to("cuda:0")
76
+ generation_output = model.generate(
77
+ tokens,
78
+ streamer=streamer,
79
+ max_new_tokens=1000,
80
+ eos_token_id=terminators,
81
+ max_length=1000,
82
+ )
83
+
84
+ ```