jiangchengchengNLP
/

huatuo_AutoAWQ_7B4bits

4-bit precision

Model card Files Files and versions Community

jiangchengchengNLP commited on Aug 5

Commit

981fc9c

•

1 Parent(s): 3bd4ee5

Update README.md (#1)

- Update README.md (315fb56494ea83c791316d5dbf455e46efde99c5)

Files changed (1) hide show

README.md +84 -3

README.md CHANGED Viewed

@@ -1,3 +1,84 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+这是基于Auto-GPTQ框架的量化模型，模型选取为huatuoGPT2-7B,这是一个微调模型，基底模型为百川-7B。
+参数说明： 原模型大小：16GB，量化后模型大小：5GB
+推理准确度尚未测试，请谨慎使用
+量化过程中，校准数据采用微调训练集Medical Fine-tuning Instruction (GPT-4)。
+使用示例(目前仅支持awq,transformers的集成尚在研究)：
+开始之前务必指定GPU
+```
+import os
+os.environ["CUDA_VISIBLE_DEVICES"] = "0"
+```
+确保你安装了auto-awq
+```
+!git clone https://github.com/casper-hansen/AutoAWQ
+cd AutoAWQ
+!pip install -e .
+```
+```
+from awq import AutoAWQForCausalLM
+from awq.utils.utils import get_best_device
+from transformers import AutoTokenizer, TextStreamer
+quant_path = "jiangchengchengNLP/huatuo_AutoAWQ_7B4bits"
+# Load model
+model = AutoAWQForCausalLM.from_quantized(quant_path,device="cuda",fuse_layers=False)
+tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
+streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+prompt = "You're standing on the surface of the Earth. "\
+        "You walk one mile south, one mile west and one mile north. "\
+        "You end up exactly where you started. Where are you?"
+chat = [
+    {"role": "user", "content": prompt},
+]
+terminators = [
+    tokenizer.eos_token_id,
+    tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+tokenizer.chat_template="""
+{%- for message in messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
+    {%- endif -%}
+    {%- if message['role'] == 'user' -%}
+        {{ '<问>：' + message['content'] + '\n' }}
+    {%- elif message['role'] == 'assistant' -%}
+        {{ '<答>：' + message['content'] + '\n' }}
+    {%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{- '<答>：' -}}
+{% endif %}
+"""
+tokens = tokenizer.apply_chat_template(
+    chat,
+    return_tensors="pt"
+)
+tokens = tokens.to("cuda:0")
+generation_output = model.generate(
+    tokens,
+    streamer=streamer,
+    max_new_tokens=1000,
+    eos_token_id=terminators,
+    max_length=1000,
+)
+```