Update README.md
Browse files
README.md
CHANGED
@@ -11,9 +11,7 @@ license: apache-2.0
|
|
11 |
model_creator: Mistral AI_
|
12 |
model_name: Mixtral 8X7B Instruct v0.1
|
13 |
model_type: mixtral
|
14 |
-
prompt_template: '
|
15 |
-
|
16 |
-
'
|
17 |
quantized_by: TheBloke
|
18 |
---
|
19 |
<!-- markdownlint-disable MD041 -->
|
@@ -68,7 +66,7 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
|
|
68 |
## Prompt template: Mistral
|
69 |
|
70 |
```
|
71 |
-
|
72 |
```
|
73 |
<!-- prompt-template end -->
|
74 |
|
@@ -201,64 +199,6 @@ It is strongly recommended to use the text-generation-webui one-click-installers
|
|
201 |
|
202 |
<!-- README_GPTQ.md-text-generation-webui end -->
|
203 |
|
204 |
-
<!-- README_GPTQ.md-use-from-python start -->
|
205 |
-
## Python code example: inference from this GPTQ model
|
206 |
-
|
207 |
-
### Install the necessary packages
|
208 |
-
|
209 |
-
Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
|
210 |
-
|
211 |
-
```shell
|
212 |
-
pip3 install --upgrade transformers optimum
|
213 |
-
# If using PyTorch 2.1 + CUDA 12.x:
|
214 |
-
pip3 install --upgrade auto-gptq
|
215 |
-
# or, if using PyTorch 2.1 + CUDA 11.x:
|
216 |
-
pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
|
217 |
-
```
|
218 |
-
|
219 |
-
If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:
|
220 |
-
|
221 |
-
```shell
|
222 |
-
pip3 uninstall -y auto-gptq
|
223 |
-
git clone https://github.com/PanQiWei/AutoGPTQ
|
224 |
-
cd AutoGPTQ
|
225 |
-
git checkout v0.5.1
|
226 |
-
pip3 install .
|
227 |
-
```
|
228 |
-
|
229 |
-
### Example Python code
|
230 |
-
|
231 |
-
```python
|
232 |
-
model_name_or_path = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
|
233 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, GPTQConfig
|
234 |
-
from auto_gptq import AutoGPTQForCausalLM
|
235 |
-
|
236 |
-
model_name_or_path = args.model_dir
|
237 |
-
# To use a different branch, change revision
|
238 |
-
# For example: revision="gptq-4bit-32g-actorder_True"
|
239 |
-
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
240 |
-
model_basename="model",
|
241 |
-
use_safetensors=True,
|
242 |
-
trust_remote_code=False,
|
243 |
-
device="cuda:0",
|
244 |
-
use_triton=False,
|
245 |
-
disable_exllama=False,
|
246 |
-
disable_exllamav2=True,
|
247 |
-
quantize_config=None)
|
248 |
-
|
249 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True, trust_remote_code=False)
|
250 |
-
|
251 |
-
prompt = "Tell me about AI"
|
252 |
-
prompt_template=f'''<s>[INST] {prompt} [/INST]
|
253 |
-
'''
|
254 |
-
|
255 |
-
print("\n\n*** Generate:")
|
256 |
-
|
257 |
-
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
|
258 |
-
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
|
259 |
-
print(tokenizer.decode(output[0]))
|
260 |
-
```
|
261 |
-
<!-- README_GPTQ.md-use-from-python end -->
|
262 |
|
263 |
<!-- footer start -->
|
264 |
<!-- 200823 -->
|
|
|
11 |
model_creator: Mistral AI_
|
12 |
model_name: Mixtral 8X7B Instruct v0.1
|
13 |
model_type: mixtral
|
14 |
+
prompt_template: '[INST] {prompt} [/INST] '
|
|
|
|
|
15 |
quantized_by: TheBloke
|
16 |
---
|
17 |
<!-- markdownlint-disable MD041 -->
|
|
|
66 |
## Prompt template: Mistral
|
67 |
|
68 |
```
|
69 |
+
[INST] {prompt} [/INST]
|
70 |
```
|
71 |
<!-- prompt-template end -->
|
72 |
|
|
|
199 |
|
200 |
<!-- README_GPTQ.md-text-generation-webui end -->
|
201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
202 |
|
203 |
<!-- footer start -->
|
204 |
<!-- 200823 -->
|