strange errors with Mixtral-8x22B-Instruct-v0.1.Q5_K_M

#37
by cyrilAub - opened

I try to use the model to select functions to call among a list and answer via valid json. When using Mixtral-8x22B-Instruct-v0.1.Q5_K_M with Llama.cpp like the following, the model answers with mistakes on not valid json:
llm = Llama(
model_path=model_name,
verbose=True,
n_gpu_layers=-1,
n_ctx=14096,
)
output = llm(
prompt, # Prompt
max_tokens=-1,
stop=[""],
echo=False,
temperature=0.01
)
return output["choices"][0]["text"]

example of bad answer:
{"thought": "I will call the search_flights tool to find a later flight for the user","action": "searcch_flights","action_input": {"arrival_airport": "BS", "depature_airport": "CDG"}}

the right function is search_flights, not searcch_flights, and the arrival airport is BSL, not BS.

with PyTorch and transformers.AutoModelForCausalLM it works perfectly with the same model from huggingface (not gguf, the original Mixtral-8x22B-Instruct-v0.1)

Here is an example of code I use :
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name,use_fast=True)
bnb_config = transformers.BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = transformers.AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, attn_implementation="flash_attention_2",quantization_config=bnb_config,device_map="auto",pad_token_id=0)

I'm gonna try with 8bit gguf and 16bit from this repo to see if it's doing this kinf of strange mistakes

Sign up or log in to comment