alvarobartt HF staff commited on
Commit
4266f7f
1 Parent(s): 9292676

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -5
README.md CHANGED
@@ -41,6 +41,8 @@ In order to run the inference with Llama 3.1 8B Instruct AWQ in INT4, both `torc
41
  pip install "torch>=2.2.0,<2.3.0" autoawq --upgrade
42
  ```
43
 
 
 
44
  Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
45
 
46
  ```bash
@@ -61,7 +63,13 @@ prompt = [
61
 
62
  tokenizer = AutoTokenizer.from_pretrained(model_id)
63
 
64
- inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt").cuda()
 
 
 
 
 
 
65
 
66
  model = AutoModelForCausalLM.from_pretrained(
67
  model_id,
@@ -70,7 +78,7 @@ model = AutoModelForCausalLM.from_pretrained(
70
  device_map="auto",
71
  )
72
 
73
- outputs = model.generate(inputs, do_sample=True, max_new_tokens=256)
74
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
75
  ```
76
 
@@ -82,6 +90,8 @@ In order to run the inference with Llama 3.1 8B Instruct AWQ in INT4, both `torc
82
  pip install "torch>=2.2.0,<2.3.0" autoawq --upgrade
83
  ```
84
 
 
 
85
  Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
86
 
87
  ```bash
@@ -103,7 +113,13 @@ prompt = [
103
 
104
  tokenizer = AutoTokenizer.from_pretrained(model_id)
105
 
106
- inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt").cuda()
 
 
 
 
 
 
107
 
108
  model = AutoAWQForCausalLM.from_pretrained(
109
  model_id,
@@ -112,11 +128,11 @@ model = AutoAWQForCausalLM.from_pretrained(
112
  device_map="auto",
113
  )
114
 
115
- outputs = model.generate(inputs, do_sample=True, max_new_tokens=256)
116
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
117
  ```
118
 
119
- The AutoAWQ script has been adapted from [AutoAWQ/examples/generate.py](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py).
120
 
121
  ### 🤗 Text Generation Inference (TGI)
122
 
 
41
  pip install "torch>=2.2.0,<2.3.0" autoawq --upgrade
42
  ```
43
 
44
+ Otherwise, running the model inference may fail, since the AutoAWQ kernels are built with PyTorch 2.2.1, meaning that those will break with PyTorch 2.3.0.
45
+
46
  Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
47
 
48
  ```bash
 
63
 
64
  tokenizer = AutoTokenizer.from_pretrained(model_id)
65
 
66
+ inputs = tokenizer.apply_chat_template(
67
+ prompt,
68
+ tokenize=True,
69
+ add_generation_prompt=True,
70
+ return_tensors="pt",
71
+ return_dict=True,
72
+ ).to("cuda")
73
 
74
  model = AutoModelForCausalLM.from_pretrained(
75
  model_id,
 
78
  device_map="auto",
79
  )
80
 
81
+ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
82
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
83
  ```
84
 
 
90
  pip install "torch>=2.2.0,<2.3.0" autoawq --upgrade
91
  ```
92
 
93
+ Otherwise, running the model inference may fail, since the AutoAWQ kernels are built with PyTorch 2.2.1, meaning that those will break with PyTorch 2.3.0.
94
+
95
  Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
96
 
97
  ```bash
 
113
 
114
  tokenizer = AutoTokenizer.from_pretrained(model_id)
115
 
116
+ inputs = tokenizer.apply_chat_template(
117
+ prompt,
118
+ tokenize=True,
119
+ add_generation_prompt=True,
120
+ return_tensors="pt",
121
+ return_dict=True,
122
+ ).to("cuda")
123
 
124
  model = AutoAWQForCausalLM.from_pretrained(
125
  model_id,
 
128
  device_map="auto",
129
  )
130
 
131
+ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
132
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
133
  ```
134
 
135
+ The AutoAWQ script has been adapted from [`AutoAWQ/examples/generate.py`](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py).
136
 
137
  ### 🤗 Text Generation Inference (TGI)
138