File size: 435 Bytes
00662b4 dcc62ab 280daca 00662b4 dcc62ab 00662b4 dcc62ab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
---
library_name: peft
base_model: EleutherAI/pythia-410m-deduped
license: apache-2.0
datasets:
- argilla/dpo-mix-7k
tags:
- RLHF
- RLAIF
- PPO
- RM
- reward-model
- reward_model
---
# sapphia-410m-RM
super duper ultra highly experimental lora finetune of EleutherAI/pythia-410m-deduped on argilla/dpo-mix-7k, to be a reward model.
## why?
nexusflow achieved good results with traditional reward model finetuning! why not meeeeeee :3 |