sapphia-410m-RM

super duper ultra highly experimental lora finetune of EleutherAI/pythia-410m-deduped on argilla/dpo-mix-7k, to be a reward model.

why?

nexusflow achieved good results with traditional reward model finetuning! why not meeeeeee :3

Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Fizzarolli/sapphia-410m-RM

Adapter
(2)
this model

Dataset used to train Fizzarolli/sapphia-410m-RM