Poor Performance - Always seems to return the label 0 (Entailment)
Hi there, I love the idea of this, but I am not seeing good performance. This is returning entailment (0) for everything. I am wondering if I am doing something wrong, and if not, what accuracy do you get when you test this using the mnli test dataset?
from transformers import AutoModelForCausalLM, AutoTokenizer, MistralForCausalLM, LlamaTokenizerFast
import torch
device = torch.device('cuda')
model_id = "mistralai/Mistral-7B-v0.1"
peft_model_id = "predibase/glue_mnli"
model: MistralForCausalLM = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, torch_dtype=torch.bfloat16)
model.load_adapter(peft_model_id)
input_string = """You are given a premise and a hypothesis below. If the premise entails the hypothesis, return 0. If the premise contradicts the hypothesis, return 2. Otherwise, if the premise does neither, return 1.
### Premise: The customer's feedback was "I went to Bank of America and while I think their personal loans are absolute garbage, I was very impressed with their customer service. I was also impressed with the cleanliness of the branch. I will definitely be returning to ANZ in the future. Might get a credit card too, they had good rates."
### Hypothesis: The customer's feedback includes mention of a white unicorn.
### Label:"""
tokenizer: LlamaTokenizerFast = AutoTokenizer.from_pretrained(model_id)
input_tokens = tokenizer(input_string, return_tensors="pt").to(device)
generation_output = model.generate(input_tokens['input_ids'], max_length=4000, pad_token_id=tokenizer.eos_token_id)
output = tokenizer.decode(generation_output[0])
print(output)
Output is:
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:07<00:00, 3.55s/it]
<s> You are given a premise and a hypothesis below. If the premise entails the hypothesis, return 0. If the premise contradicts the hypothesis, return 2. Otherwise, if the premise does neither, return 1.
### Premise: The customer's feedback was "I went to Bank of America and while I think their personal loans are absolute garbage, I was very impressed with their customer service. I was also impressed with the cleanliness of the branch. I will definitely be returning to ANZ in the future. Might get a credit card too, they had good rates."
### Hypothesis: The customer's feedback includes mention of a white unicorn.
### Label:<s> 0</s>
This is not an abnormal example. I have tried this will multiple (fictitious) tests and everytime it returns 0
.
Hi! I can definitely take a look at this -- there might be an issue with how the weights were uploaded. The accuracy we received on this dataset was about 87%.
Please feel free to check out the adapter's performance here: https://predibase.com/lora-land
Also, if you want to join our Discord, we are more likely to respond more quickly to comments there: https://discord.gg/CBgdrGnZjy