Issue when finetuning the reward model on custom dataset
#2
by
yguooo
- opened
Currently, I am benchmarking the performance of different reward on custom dataset. And I encountered the following problem when using a standard pipeline from trl, similar to https://huggingface.co/docs/trl/en/reward_trainer.
I am wondering what I should do to fix the issue above.
Thank you!
The model expects data to be prepared in a specific format - see https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1#demo-code
Haoxiang-Wang
changed discussion status to
closed