Great reward model, what dataset did you use to train?
#1
by
zolicsaki
- opened
Specifically I was wondering if you trained it on lmsys chatbot arena conversations, because your model is performing so well when evaluated on those preferences. Thanks for the help!
https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
Sorry for the late reply. We did use a portion of this dataset. We performed data cleaning and filtering, including removing toxic and unsafe data, to ensure quality and safety.