hugging-quants
/

Meta-Llama-3.1-8B-Instruct-AWQ-INT4

@@ -69,7 +69,7 @@ inputs = tokenizer.apply_chat_template(
 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
-print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 ### AutoAWQ
@@ -109,7 +109,7 @@ inputs = tokenizer.apply_chat_template(
 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
-print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
 ```
 The AutoAWQ script has been adapted from [`AutoAWQ/examples/generate.py`](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py).

 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
+print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
 ```
 ### AutoAWQ
 ).to("cuda")
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
+print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
 ```
 The AutoAWQ script has been adapted from [`AutoAWQ/examples/generate.py`](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py).