metadata
language:
- en
tags:
- webgpt
- regression
- reward-model
license: apache-2.0
datasets:
- openai/webgpt_comparisons
metrics:
- accuracy
Reward Model pretrained on openai/webgpt_comparison
Reward model finetuned from existing pretrain model.
Things that aligned with the orignal papers
Overfits easily using rank loss
Small learning rate
Different from the papers
Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.
Train using a 80-20 train-validation split on torch AMP settings
Other models I had tried
bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt
gpt2-large : not stable
gpt2-base : not stable
Performance on validation split
model | val acc | val loss (rank loss) |
---|---|---|
roberta-base | 56.21 | 0.71 |
roberta-large | 57.89 | 0.67 |
electra-base | 57.02 | 0.70 |
electra-large | 58.75 | 0.69 |
Tensorboard logs are located under runs/
Note:
- You will have to reweight this model output such that the mean rewards equals to 0