martimfasantos's picture
End of training
f263999 verified
metadata
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - openai/summarize_from_feedback
model-index:
  - name: tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old
    results: []

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_2epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6891
  • Rewards/chosen: -0.0201
  • Rewards/rejected: -0.0288
  • Rewards/accuracies: 0.5911
  • Rewards/margins: 0.0087
  • Logps/rejected: -66.0638
  • Logps/chosen: -60.7225
  • Logits/rejected: -3.0949
  • Logits/chosen: -3.1006

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0689 100 0.6931 0.0001 0.0001 0.5023 0.0000 -63.1703 -58.7007 -3.1577 -3.1633
0.6931 0.1378 200 0.6932 0.0001 0.0002 0.4875 -0.0001 -63.1621 -58.7010 -3.1575 -3.1632
0.6929 0.2068 300 0.6931 0.0004 0.0003 0.5149 0.0001 -63.1505 -58.6712 -3.1569 -3.1625
0.6927 0.2757 400 0.6930 0.0007 0.0005 0.5258 0.0003 -63.1350 -58.6397 -3.1555 -3.1611
0.692 0.3446 500 0.6929 0.0012 0.0007 0.5246 0.0005 -63.1102 -58.5951 -3.1536 -3.1592
0.6915 0.4135 600 0.6927 0.0016 0.0007 0.5504 0.0009 -63.1105 -58.5481 -3.1508 -3.1564
0.6912 0.4824 700 0.6924 0.0019 0.0004 0.5671 0.0015 -63.1424 -58.5229 -3.1481 -3.1538
0.69 0.5513 800 0.6922 0.0019 -0.0000 0.5760 0.0019 -63.1839 -58.5249 -3.1444 -3.1500
0.6893 0.6203 900 0.6919 0.0017 -0.0008 0.5709 0.0025 -63.2630 -58.5425 -3.1403 -3.1459
0.6892 0.6892 1000 0.6917 0.0011 -0.0020 0.5725 0.0030 -63.3758 -58.6063 -3.1361 -3.1418
0.6892 0.7581 1100 0.6914 0.0002 -0.0034 0.5809 0.0036 -63.5250 -58.6939 -3.1313 -3.1369
0.6885 0.8270 1200 0.6911 -0.0007 -0.0050 0.5755 0.0043 -63.6802 -58.7853 -3.1282 -3.1338
0.6877 0.8959 1300 0.6908 -0.0024 -0.0073 0.5781 0.0048 -63.9072 -58.9567 -3.1223 -3.1280
0.6874 0.9649 1400 0.6907 -0.0040 -0.0092 0.5771 0.0053 -64.1026 -59.1085 -3.1205 -3.1262
0.6871 1.0338 1500 0.6904 -0.0055 -0.0113 0.5825 0.0058 -64.3106 -59.2603 -3.1153 -3.1210
0.6863 1.1027 1600 0.6902 -0.0075 -0.0138 0.5888 0.0063 -64.5576 -59.4592 -3.1122 -3.1179
0.6854 1.1716 1700 0.6900 -0.0096 -0.0163 0.5867 0.0067 -64.8090 -59.6681 -3.1086 -3.1143
0.6855 1.2405 1800 0.6898 -0.0120 -0.0192 0.5827 0.0072 -65.0974 -59.9114 -3.1070 -3.1126
0.6824 1.3094 1900 0.6897 -0.0139 -0.0213 0.5825 0.0074 -65.3089 -60.1001 -3.1034 -3.1091
0.6851 1.3784 2000 0.6895 -0.0155 -0.0234 0.5906 0.0079 -65.5166 -60.2616 -3.1014 -3.1071
0.6834 1.4473 2100 0.6895 -0.0167 -0.0247 0.5862 0.0080 -65.6501 -60.3842 -3.0998 -3.1055
0.6828 1.5162 2200 0.6894 -0.0179 -0.0261 0.5874 0.0082 -65.7914 -60.5049 -3.0984 -3.1041
0.6833 1.5851 2300 0.6892 -0.0188 -0.0273 0.5901 0.0085 -65.9073 -60.5933 -3.0973 -3.1030
0.6835 1.6540 2400 0.6892 -0.0193 -0.0279 0.5862 0.0086 -65.9739 -60.6469 -3.0961 -3.1018
0.6826 1.7229 2500 0.6892 -0.0197 -0.0283 0.5850 0.0086 -66.0099 -60.6819 -3.0956 -3.1013
0.6825 1.7919 2600 0.6891 -0.0198 -0.0285 0.5890 0.0088 -66.0344 -60.6882 -3.0949 -3.1007
0.6823 1.8608 2700 0.6891 -0.0200 -0.0287 0.5890 0.0087 -66.0526 -60.7165 -3.0949 -3.1006
0.6816 1.9297 2800 0.6891 -0.0201 -0.0289 0.5841 0.0088 -66.0728 -60.7263 -3.0951 -3.1008
0.6836 1.9986 2900 0.6891 -0.0201 -0.0288 0.5911 0.0087 -66.0638 -60.7225 -3.0949 -3.1006

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1