CharlesLi's picture
Model save
3ee5d97 verified
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-8-reward
    results: []

OpenELM-1_1B-DPO-full-max-8-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7740
  • Rewards/chosen: -15.6875
  • Rewards/rejected: -17.875
  • Rewards/accuracies: 0.6172
  • Rewards/margins: 2.2031
  • Logps/rejected: -2080.0
  • Logps/chosen: -1888.0
  • Logits/rejected: 0.8320
  • Logits/chosen: -0.9922

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5077 0.1047 100 0.6684 -1.2422 -1.4922 0.6191 0.2490 -438.0 -442.0 -10.4375 -10.8125
0.436 0.2094 200 0.7756 -2.9219 -3.3281 0.6191 0.4141 -620.0 -608.0 -9.6875 -10.1875
0.4375 0.3141 300 0.7544 -4.0 -4.625 0.6426 0.6328 -752.0 -720.0 -8.9375 -9.9375
0.4641 0.4188 400 0.7598 -3.5938 -4.2188 0.6270 0.6094 -708.0 -680.0 -9.8125 -10.6875
0.3819 0.5236 500 0.8648 -5.0938 -5.8438 0.6074 0.7383 -872.0 -828.0 -7.8438 -9.125
0.4052 0.6283 600 0.8811 -5.1875 -5.9375 0.6016 0.7461 -880.0 -836.0 -9.3125 -10.625
0.397 0.7330 700 0.7826 -4.5938 -5.3438 0.6445 0.7578 -824.0 -780.0 -7.5 -9.125
0.3853 0.8377 800 0.8263 -5.8438 -6.5938 0.6328 0.7461 -948.0 -904.0 -5.9688 -7.3125
0.3438 0.9424 900 1.0278 -7.5938 -8.8125 0.6230 1.2344 -1168.0 -1080.0 -2.5 -4.2188
0.0879 1.0471 1000 1.2819 -9.375 -10.8125 0.6055 1.4375 -1368.0 -1256.0 -6.625 -8.5
0.0875 1.1518 1100 1.2599 -10.3125 -11.75 0.6152 1.4609 -1464.0 -1352.0 -3.6406 -5.25
0.1119 1.2565 1200 1.0713 -7.9688 -9.125 0.6230 1.1562 -1200.0 -1112.0 -4.375 -6.2188
0.1083 1.3613 1300 1.1731 -10.1875 -11.5 0.5918 1.2969 -1440.0 -1336.0 -3.7188 -5.4375
0.0827 1.4660 1400 1.0477 -9.25 -10.5 0.6152 1.25 -1336.0 -1240.0 -2.6094 -4.5
0.0913 1.5707 1500 1.0557 -9.25 -10.625 0.6270 1.3828 -1352.0 -1248.0 -2.9688 -4.7812
0.0813 1.6754 1600 1.2081 -11.4375 -13.0 0.6230 1.5625 -1584.0 -1456.0 -1.0156 -2.7812
0.0882 1.7801 1700 1.1652 -11.5625 -13.0 0.6348 1.4531 -1592.0 -1472.0 -3.0469 -4.7812
0.0991 1.8848 1800 1.0546 -9.6875 -11.0 0.6211 1.3203 -1392.0 -1288.0 -0.2773 -2.0469
0.0663 1.9895 1900 1.1602 -11.0625 -12.625 0.6348 1.5312 -1552.0 -1424.0 -1.9766 -3.7344
0.0132 2.0942 2000 1.6895 -15.4375 -17.5 0.6191 2.0625 -2040.0 -1856.0 0.3359 -1.5391
0.0613 2.1990 2100 1.7890 -15.8125 -18.0 0.6191 2.2031 -2096.0 -1896.0 0.7539 -1.0625
0.0101 2.3037 2200 1.7495 -16.125 -18.375 0.6211 2.2031 -2128.0 -1928.0 1.25 -0.4414
0.0138 2.4084 2300 1.7596 -15.625 -17.75 0.6133 2.2031 -2064.0 -1880.0 1.0234 -0.7891
0.0121 2.5131 2400 1.7912 -15.625 -17.875 0.6152 2.2188 -2080.0 -1880.0 0.6641 -1.1797
0.0107 2.6178 2500 1.7927 -15.75 -18.0 0.6133 2.1875 -2080.0 -1896.0 0.8281 -0.9883
0.0145 2.7225 2600 1.7578 -15.5 -17.625 0.6191 2.2031 -2048.0 -1864.0 0.7031 -1.1328
0.0133 2.8272 2700 1.7674 -15.625 -17.875 0.6152 2.2031 -2080.0 -1880.0 0.8281 -0.9961
0.0114 2.9319 2800 1.7740 -15.6875 -17.875 0.6172 2.2031 -2080.0 -1888.0 0.8320 -0.9922

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0