NousResearch
/

Nous-Hermes-2-Mistral-7B-DPO-GGUF

Inference Endpoints

Model card Files Files and versions Community

teknium commited on Feb 21

Commit

eb85cf0

•

1 Parent(s): b78852d

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -176,7 +176,7 @@ In LM-Studio, simply select the ChatML Prefix on the settings side pane:
 # Inference Code
-Here is example code using HuggingFace Transformers to inference the model (note: even in 4bit, it will require more than 24GB of VRAM)
 ```python
 # Code to inference Hermes with HF Transformers
@@ -187,9 +187,9 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
 from transformers import LlamaTokenizer, MixtralForCausalLM
 import bitsandbytes, flash_attn
-tokenizer = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO', trust_remote_code=True)
 model = MixtralForCausalLM.from_pretrained(
-    "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
     torch_dtype=torch.float16,
     device_map="auto",
     load_in_8bit=False,

 # Inference Code
+Here is example code using HuggingFace Transformers to inference the model (note: in 4bit, it will require around 5GB of VRAM)
 ```python
 # Code to inference Hermes with HF Transformers
 from transformers import LlamaTokenizer, MixtralForCausalLM
 import bitsandbytes, flash_attn
+tokenizer = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mistral-7B-DPO', trust_remote_code=True)
 model = MixtralForCausalLM.from_pretrained(
+    "NousResearch/Nous-Hermes-2-Mistral-7B-DPO",
     torch_dtype=torch.float16,
     device_map="auto",
     load_in_8bit=False,