Bug with example code:
Thanks, it looks like an excellent model.
But, I face a few difficulties running it.
The name/path needs to be modified, to Nexusflow + adding, "use_auth_token=True" as it is private:
reward_model = LlamaForSequenceClassification.from_pretrained("berkeley-nest/Starling-RM-34B",torch_dtype=torch.bfloat16)
->
reward_model = LlamaForSequenceClassification.from_pretrained("Nexusflow/Starling-RM-34B", torch_dtype=torch.bfloat16, use_auth_token=True)But after loading it still gets into the next bug, saying there is some mismatch in model weights:
"RuntimeError: Error(s) in loading state_dict for LlamaForSequenceClassification:
size mismatch for transformer.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 7168]) from checkpoint, the shape in current model is
torch.Size([7168, 7168])."
...
I await your response,
Best,
Asaf
Hi @Asaf-Yehudai ,
Thanks for the bug catch on (1).
Are you still experiencing the issue in (2)? I have not seen this on my end.
Sure,
Not sure what didn't work, but eventually was managed to run it on GPU, with some small modifications.
reward_model = LlamaForSequenceClassification.from_pretrained("Nexusflow/Starling-RM-34B", torch_dtype=torch.bfloat16, device_map='auto', use_auth_token=True)
reward_tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B-Chat")
reward_tokenizer.truncation_side = "left"
reward_model.eval().requires_grad_(False)
Define the reward function
reward_batch_size = 1
def get_reward(samples):
"""samples: List[str]"""
# Query the device on which the model is
model_device = reward_model.get_device()
input_ids = []
attention_masks = []
encodings_dict = reward_tokenizer(
samples,
truncation=True,
max_length=2048,
padding="max_length",
return_tensors="pt",
)
# Move the tensors to the same device as the model
input_ids = encodings_dict["input_ids"].to(model_device)
attention_masks = encodings_dict["attention_mask"].to(model_device)
mbs = reward_batch_size
out = []
for i in range(math.ceil(len(samples) / mbs)):
rewards = reward_model(input_ids=input_ids[i * mbs : (i + 1) * mbs], attention_mask=attention_masks[i * mbs : (i + 1) * mbs])
out.extend(rewards["scores"])
return torch.hstack(out)