Issue when finetuning the reward model on custom dataset

by yguooo - opened Jun 4

Jun 4

Currently, I am benchmarking the performance of different reward on custom dataset. And I encountered the following problem when using a standard pipeline from trl, similar to https://huggingface.co/docs/trl/en/reward_trainer.

I am wondering what I should do to fix the issue above.

Thank you!

Haoxiang-Wang

RLHFlow org Jun 4

The model expects data to be prepared in a specific format - see https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1#demo-code

Haoxiang-Wang changed discussion status to closed Jul 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment