library_name: peft | |
base_model: EleutherAI/pythia-410m-deduped | |
license: apache-2.0 | |
datasets: | |
- argilla/dpo-mix-7k | |
tags: | |
- RLHF | |
- RLAIF | |
- PPO | |
- RM | |
- reward-model | |
- reward_model | |
# sapphia-410m-RM | |
super duper ultra highly experimental lora finetune of EleutherAI/pythia-410m-deduped on argilla/dpo-mix-7k, to be a reward model. | |
## why? | |
nexusflow achieved good results with traditional reward model finetuning! why not meeeeeee :3 |