metadata

language:
  - en
tags:
  - webgpt
  - regression
  - reward-model
license: apache-2.0
datasets:
  - openai/webgpt_comparisons
metrics:
  - accuracy

Reward Model pretrained on openai/webgpt_comparison

Reward model finetuned from existing pretrain model.

Things that aligned with the orignal papers

Different from the papers

Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.
Train using a 80-20 train-validation split on torch AMP settings

Other models I had tried

bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt
gpt2-large : not stable
gpt2-base : not stable

Performance on validation split

Tensorboard logs are located under runs/

You will have to reweight this model output such that the mean rewards equals to 0