OpenRLHF
/

Mistral-7b-PRM-Math-Shepherd

Model card Files Files and versions Community

Mistral-7b-PRM-Math-Shepherd / README.md

chuyi777's picture

Update README.md

41d1ad8 verified 29 days ago

|

history blame contribute delete

173 Bytes

Process Reward Model trained by OpenRLHF

Dataset: Math-Shepherd (https://huggingface.co/datasets/peiyi9979/Math-Shepherd)
Learning Rate: 1e-6
Training Accuracy: 0.922