TheBloke
/

Mixtral-8x7B-Instruct-v0.1-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Dec 13, 2023

Commit

4e14067

•

1 Parent(s): 7ba4217

Update README.md

Files changed (1) hide show

README.md +2 -62

README.md CHANGED Viewed

@@ -11,9 +11,7 @@ license: apache-2.0
 model_creator: Mistral AI_
 model_name: Mixtral 8X7B Instruct v0.1
 model_type: mixtral
-prompt_template: '<s>[INST] {prompt} [/INST]
-  '
 quantized_by: TheBloke
 ---
 <!-- markdownlint-disable MD041 -->
@@ -68,7 +66,7 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
 ## Prompt template: Mistral
 ```
-<s>[INST] {prompt} [/INST]
 ```
 <!-- prompt-template end -->
@@ -201,64 +199,6 @@ It is strongly recommended to use the text-generation-webui one-click-installers
 <!-- README_GPTQ.md-text-generation-webui end -->
-<!-- README_GPTQ.md-use-from-python start -->
-## Python code example: inference from this GPTQ model
-### Install the necessary packages
-Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
-```shell
-pip3 install --upgrade transformers optimum
-# If using PyTorch 2.1 + CUDA 12.x:
-pip3 install --upgrade auto-gptq
-# or, if using PyTorch 2.1 + CUDA 11.x:
-pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
-```
-If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:
-```shell
-pip3 uninstall -y auto-gptq
-git clone https://github.com/PanQiWei/AutoGPTQ
-cd AutoGPTQ
-git checkout v0.5.1
-pip3 install .
-```
-### Example Python code
-```python
-model_name_or_path = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
-from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, GPTQConfig
-from auto_gptq import AutoGPTQForCausalLM
-model_name_or_path = args.model_dir
-# To use a different branch, change revision
-# For example: revision="gptq-4bit-32g-actorder_True"
-model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
-        model_basename="model",
-        use_safetensors=True,
-        trust_remote_code=False,
-        device="cuda:0",
-        use_triton=False,
-        disable_exllama=False,
-        disable_exllamav2=True,
-        quantize_config=None)
-tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True, trust_remote_code=False)
-prompt = "Tell me about AI"
-prompt_template=f'''<s>[INST] {prompt} [/INST]
-'''
-print("\n\n*** Generate:")
-input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
-output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
-print(tokenizer.decode(output[0]))
-```
-<!-- README_GPTQ.md-use-from-python end -->
 <!-- footer start -->
 <!-- 200823 -->

 model_creator: Mistral AI_
 model_name: Mixtral 8X7B Instruct v0.1
 model_type: mixtral
+prompt_template: '[INST] {prompt} [/INST]  '
 quantized_by: TheBloke
 ---
 <!-- markdownlint-disable MD041 -->
 ## Prompt template: Mistral
 ```
+[INST] {prompt} [/INST]
 ```
 <!-- prompt-template end -->
 <!-- README_GPTQ.md-text-generation-webui end -->
 <!-- footer start -->
 <!-- 200823 -->