study-hjt
/

Qwen1.5-32B-Chat-GPTQ-Int8

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

study-hjt commited on Apr 26

Commit

f02496e

•

1 Parent(s): 7710fe0

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -59,15 +59,15 @@ KeyError: 'qwen2'
 Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
 ```python
-from modelscope import AutoModelForCausalLM, AutoTokenizer
 device = "cuda" # the device to load the model onto
 model = AutoModelForCausalLM.from_pretrained(
-    "huangjintao/Qwen1.5-32B-Chat-GPTQ-Int8",
     torch_dtype="auto",
     device_map="auto"
 )
-tokenizer = AutoTokenizer.from_pretrained("huangjintao/Qwen1.5-32B-Chat-GPTQ-Int8")
 prompt = "Give me a short introduction to large language model."
 messages = [

 Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
 device = "cuda" # the device to load the model onto
 model = AutoModelForCausalLM.from_pretrained(
+    "study-hjt/Qwen1.5-32B-Chat-GPTQ-Int8",
     torch_dtype="auto",
     device_map="auto"
 )
+tokenizer = AutoTokenizer.from_pretrained("study-hjt/Qwen1.5-32B-Chat-GPTQ-Int8")
 prompt = "Give me a short introduction to large language model."
 messages = [