training / README.md
therem's picture
gpt2_dpo_lora/1
7992d32
metadata
base_model: lvwerra/gpt2-imdb
tags:
  - generated_from_trainer
model-index:
  - name: training
    results: []

training

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4649
  • Rewards/chosen: 1.1097
  • Rewards/rejected: 0.3323
  • Rewards/accuracies: 0.8186
  • Rewards/margins: 0.7774
  • Logps/rejected: -143.4800
  • Logps/chosen: -175.0714
  • Logits/rejected: -35.2043
  • Logits/chosen: -32.7114

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
No log 0.55 400 0.6593 1.0074 0.5904 0.7357 0.4170 -140.8990 -176.0949 -35.9356 -33.1922
0.7974 1.11 800 0.5807 1.1511 0.5902 0.7634 0.5610 -140.9016 -174.6575 -35.9192 -33.2655
0.5983 1.66 1200 0.5200 1.0697 0.4300 0.7979 0.6397 -142.5030 -175.4720 -35.5696 -33.0300
0.4982 2.21 1600 0.4807 1.1128 0.3733 0.8158 0.7395 -143.0704 -175.0409 -35.2967 -32.7791
0.4663 2.77 2000 0.4649 1.1097 0.3323 0.8186 0.7774 -143.4800 -175.0714 -35.2043 -32.7114

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0