Generalizable Reward Models
-
Ray2333/GRM-llama3-8B-sftreg
Text Classification • Updated • 111 • 5 -
Ray2333/GRM-llama3-8B-distill
Text Classification • Updated • 80 • 6 -
Ray2333/GRM-Gemma-2B-sftreg
Text Classification • Updated • 108 • 4 -
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Paper • 2406.10216 • Published • 2