Edit model card

OpenELM-1_1B-DPO-full-max-8-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8053
  • Rewards/chosen: -14.375
  • Rewards/rejected: -16.625
  • Rewards/accuracies: 0.6094
  • Rewards/margins: 2.2656
  • Logps/rejected: -1952.0
  • Logps/chosen: -1752.0
  • Logits/rejected: 2.8281
  • Logits/chosen: 1.2578

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5081 0.1047 100 0.6820 -1.1406 -1.3438 0.5977 0.2031 -424.0 -432.0 -9.75 -10.1875
0.4521 0.2094 200 0.7579 -2.8906 -3.4219 0.6465 0.5312 -632.0 -608.0 -7.25 -8.0625
0.4267 0.3141 300 0.7476 -2.4531 -2.9688 0.5859 0.5156 -584.0 -564.0 -13.875 -14.0
0.4431 0.4188 400 0.7253 -2.8125 -3.2812 0.625 0.4785 -616.0 -600.0 -12.5 -13.25
0.3779 0.5236 500 0.9411 -6.125 -6.9375 0.6113 0.8164 -984.0 -932.0 -5.0312 -6.1875
0.404 0.6283 600 0.8501 -5.2188 -6.0625 0.6152 0.8359 -892.0 -840.0 -11.6875 -12.875
0.3834 0.7330 700 0.8127 -4.25 -5.2188 0.6270 0.9609 -808.0 -744.0 -7.1562 -8.625
0.3763 0.8377 800 0.8067 -4.5625 -5.4062 0.6660 0.8516 -828.0 -772.0 -5.1562 -6.8438
0.3462 0.9424 900 0.8712 -5.0625 -6.0312 0.6172 0.9805 -892.0 -824.0 -3.7031 -5.4375
0.0835 1.0471 1000 1.1523 -7.9375 -9.3125 0.6074 1.3828 -1216.0 -1112.0 -3.2188 -5.125
0.0895 1.1518 1100 1.3590 -8.6875 -10.0 0.5898 1.3281 -1288.0 -1184.0 -4.375 -6.375
0.1146 1.2565 1200 1.2844 -8.3125 -9.5625 0.6055 1.2734 -1248.0 -1144.0 0.7070 -1.0859
0.102 1.3613 1300 1.1325 -8.1875 -9.4375 0.6074 1.2812 -1232.0 -1136.0 -2.0938 -4.0625
0.0893 1.4660 1400 1.0034 -7.75 -9.0 0.6309 1.2344 -1184.0 -1096.0 -1.7656 -3.7344
0.0968 1.5707 1500 1.2212 -9.625 -11.25 0.5957 1.625 -1416.0 -1280.0 -1.1016 -3.0625
0.082 1.6754 1600 1.2544 -10.875 -12.375 0.6055 1.5156 -1528.0 -1408.0 1.4609 -0.2871
0.0972 1.7801 1700 1.1152 -8.875 -10.3125 0.6055 1.4219 -1320.0 -1208.0 -0.2949 -1.9844
0.1053 1.8848 1800 1.0634 -7.6875 -9.125 0.6230 1.4375 -1200.0 -1088.0 0.6055 -1.2656
0.0543 1.9895 1900 1.3391 -10.625 -12.25 0.6328 1.625 -1512.0 -1384.0 1.1953 -0.5391
0.0139 2.0942 2000 1.6984 -12.8125 -14.8125 0.6055 1.9922 -1768.0 -1600.0 1.6562 -0.1348
0.0268 2.1990 2100 1.6732 -12.8125 -14.875 0.5996 2.0469 -1776.0 -1600.0 2.0938 0.4277
0.0119 2.3037 2200 1.7792 -13.875 -16.125 0.6191 2.2656 -1904.0 -1712.0 2.6875 1.1172
0.0139 2.4084 2300 1.7628 -13.5 -15.75 0.6055 2.2188 -1864.0 -1672.0 2.6719 1.0625
0.0106 2.5131 2400 1.8904 -14.5625 -16.875 0.6035 2.3281 -1976.0 -1776.0 2.8438 1.2656
0.0133 2.6178 2500 1.7945 -14.125 -16.375 0.6172 2.2188 -1920.0 -1728.0 2.9062 1.3438
0.0065 2.7225 2600 1.7784 -14.125 -16.375 0.6152 2.2344 -1928.0 -1728.0 2.8125 1.2578
0.0126 2.8272 2700 1.7995 -14.3125 -16.625 0.6094 2.25 -1944.0 -1752.0 2.8281 1.2734
0.0068 2.9319 2800 1.8053 -14.375 -16.625 0.6094 2.2656 -1952.0 -1752.0 2.8281 1.2578

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.