martimfasantos's picture
Model save
2c35307 verified
metadata
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: tinyllama-1.1b-sum-simpo_beta1.0_gamma0.8_LR5e-8_3epochs
    results: []

tinyllama-1.1b-sum-simpo_beta1.0_gamma0.8_LR5e-8_3epochs

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0879
  • Rewards/chosen: -1.9033
  • Rewards/rejected: -2.0977
  • Rewards/accuracies: 0.6229
  • Rewards/margins: 0.1944
  • Logps/rejected: -2.0977
  • Logps/chosen: -1.9033
  • Logits/rejected: -3.4251
  • Logits/chosen: -3.4288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.1055 0.0172 100 1.1173 -1.4808 -1.6083 0.5955 0.1275 -1.6083 -1.4808 -3.6326 -3.6367
1.1204 0.0345 200 1.1173 -1.4808 -1.6083 0.5948 0.1275 -1.6083 -1.4808 -3.6390 -3.6431
1.0875 0.0517 300 1.1174 -1.4810 -1.6084 0.5955 0.1273 -1.6084 -1.4810 -3.6290 -3.6332
1.1119 0.0689 400 1.1173 -1.4808 -1.6082 0.5943 0.1274 -1.6082 -1.4808 -3.6564 -3.6603
1.1439 0.0861 500 1.1174 -1.4810 -1.6083 0.5955 0.1273 -1.6083 -1.4810 -3.6435 -3.6476
1.0556 0.1034 600 1.1175 -1.4811 -1.6082 0.5941 0.1271 -1.6082 -1.4811 -3.6387 -3.6428
1.1686 0.1206 700 1.1172 -1.4806 -1.6083 0.5946 0.1277 -1.6083 -1.4806 -3.6496 -3.6536
1.1046 0.1378 800 1.1172 -1.4807 -1.6084 0.5941 0.1277 -1.6084 -1.4807 -3.6461 -3.6501
1.1817 0.1551 900 1.1169 -1.4805 -1.6086 0.5950 0.1281 -1.6086 -1.4805 -3.6312 -3.6353
1.1844 0.1723 1000 1.1170 -1.4806 -1.6086 0.5967 0.1280 -1.6086 -1.4806 -3.6534 -3.6574
1.1161 0.1895 1100 1.1168 -1.4805 -1.6086 0.5962 0.1282 -1.6086 -1.4805 -3.6328 -3.6369
1.1305 0.2068 1200 1.1165 -1.4802 -1.6089 0.5957 0.1287 -1.6089 -1.4802 -3.6268 -3.6309
1.0977 0.2240 1300 1.1163 -1.4801 -1.6092 0.5967 0.1291 -1.6092 -1.4801 -3.6315 -3.6356
1.1432 0.2412 1400 1.1161 -1.4805 -1.6099 0.5960 0.1295 -1.6099 -1.4805 -3.6389 -3.6429
1.1427 0.2584 1500 1.1160 -1.4809 -1.6106 0.5955 0.1297 -1.6106 -1.4809 -3.6223 -3.6264
1.1065 0.2757 1600 1.1155 -1.4808 -1.6113 0.5939 0.1305 -1.6113 -1.4808 -3.6324 -3.6364
1.1183 0.2929 1700 1.1153 -1.4820 -1.6129 0.5962 0.1309 -1.6129 -1.4820 -3.6176 -3.6217
1.0866 0.3101 1800 1.1149 -1.4823 -1.6138 0.5971 0.1315 -1.6138 -1.4823 -3.6203 -3.6243
1.1745 0.3274 1900 1.1147 -1.4835 -1.6155 0.6001 0.1320 -1.6155 -1.4835 -3.6214 -3.6255
1.1004 0.3446 2000 1.1142 -1.4847 -1.6175 0.6004 0.1328 -1.6175 -1.4847 -3.6214 -3.6254
1.1671 0.3618 2100 1.1139 -1.4861 -1.6194 0.6034 0.1333 -1.6194 -1.4861 -3.6193 -3.6233
1.0747 0.3790 2200 1.1135 -1.4871 -1.6211 0.6020 0.1340 -1.6211 -1.4871 -3.5959 -3.6000
1.1048 0.3963 2300 1.1131 -1.4895 -1.6242 0.6050 0.1347 -1.6242 -1.4895 -3.6181 -3.6220
1.0478 0.4135 2400 1.1126 -1.4916 -1.6271 0.6059 0.1355 -1.6271 -1.4916 -3.6027 -3.6067
1.1417 0.4307 2500 1.1120 -1.4940 -1.6306 0.6064 0.1366 -1.6306 -1.4940 -3.6005 -3.6044
1.1261 0.4480 2600 1.1116 -1.4970 -1.6342 0.6078 0.1373 -1.6342 -1.4970 -3.5878 -3.5918
1.0752 0.4652 2700 1.1109 -1.5008 -1.6394 0.6127 0.1386 -1.6394 -1.5008 -3.5903 -3.5943
1.1623 0.4824 2800 1.1105 -1.5047 -1.6440 0.6113 0.1393 -1.6440 -1.5047 -3.6001 -3.6040
1.1744 0.4997 2900 1.1100 -1.5102 -1.6505 0.6129 0.1403 -1.6505 -1.5102 -3.5956 -3.5995
1.1373 0.5169 3000 1.1094 -1.5156 -1.6570 0.6143 0.1414 -1.6570 -1.5156 -3.5820 -3.5859
1.0913 0.5341 3100 1.1089 -1.5184 -1.6608 0.6155 0.1423 -1.6608 -1.5184 -3.5832 -3.5872
1.1645 0.5513 3200 1.1084 -1.5258 -1.6691 0.6090 0.1433 -1.6691 -1.5258 -3.5695 -3.5734
1.1238 0.5686 3300 1.1078 -1.5336 -1.6783 0.6080 0.1447 -1.6783 -1.5336 -3.5825 -3.5864
1.0437 0.5858 3400 1.1070 -1.5437 -1.6900 0.6080 0.1463 -1.6900 -1.5437 -3.5807 -3.5846
1.099 0.6030 3500 1.1067 -1.5524 -1.6996 0.6106 0.1472 -1.6996 -1.5524 -3.5762 -3.5801
1.1365 0.6203 3600 1.1062 -1.5626 -1.7112 0.6099 0.1486 -1.7112 -1.5626 -3.5711 -3.5750
1.0205 0.6375 3700 1.1058 -1.5728 -1.7227 0.6094 0.1499 -1.7227 -1.5728 -3.5510 -3.5549
1.1328 0.6547 3800 1.1049 -1.5860 -1.7379 0.6127 0.1518 -1.7379 -1.5860 -3.5589 -3.5628
1.0318 0.6720 3900 1.1039 -1.5995 -1.7533 0.6127 0.1538 -1.7533 -1.5995 -3.5582 -3.5620
1.1154 0.6892 4000 1.1030 -1.6156 -1.7712 0.6166 0.1556 -1.7712 -1.6156 -3.5573 -3.5611
1.0646 0.7064 4100 1.1023 -1.6234 -1.7804 0.6178 0.1570 -1.7804 -1.6234 -3.5444 -3.5483
1.1369 0.7236 4200 1.1017 -1.6360 -1.7944 0.6171 0.1584 -1.7944 -1.6360 -3.5433 -3.5471
1.0954 0.7409 4300 1.1013 -1.6440 -1.8033 0.6183 0.1592 -1.8033 -1.6440 -3.5205 -3.5244
1.1088 0.7581 4400 1.1008 -1.6539 -1.8143 0.6176 0.1604 -1.8143 -1.6539 -3.5270 -3.5309
1.1572 0.7753 4500 1.0999 -1.6681 -1.8301 0.6206 0.1620 -1.8301 -1.6681 -3.5356 -3.5394
1.0346 0.7926 4600 1.0990 -1.6779 -1.8419 0.6241 0.1639 -1.8419 -1.6779 -3.5304 -3.5342
1.0589 0.8098 4700 1.0985 -1.6892 -1.8544 0.6248 0.1652 -1.8544 -1.6892 -3.5181 -3.5220
1.1169 0.8270 4800 1.0978 -1.7043 -1.8709 0.625 0.1665 -1.8709 -1.7043 -3.5202 -3.5240
1.0477 0.8442 4900 1.0972 -1.7175 -1.8854 0.6259 0.1679 -1.8854 -1.7175 -3.5196 -3.5234
1.1388 0.8615 5000 1.0969 -1.7191 -1.8875 0.6241 0.1684 -1.8875 -1.7191 -3.5124 -3.5162
1.0556 0.8787 5100 1.0962 -1.7341 -1.9040 0.6236 0.1699 -1.9040 -1.7341 -3.5062 -3.5100
1.0387 0.8959 5200 1.0953 -1.7483 -1.9201 0.6241 0.1718 -1.9201 -1.7483 -3.5064 -3.5102
1.066 0.9132 5300 1.0952 -1.7533 -1.9256 0.6241 0.1723 -1.9256 -1.7533 -3.5057 -3.5094
1.0191 0.9304 5400 1.0946 -1.7615 -1.9351 0.6259 0.1735 -1.9351 -1.7615 -3.4954 -3.4992
1.0353 0.9476 5500 1.0947 -1.7636 -1.9374 0.625 0.1737 -1.9374 -1.7636 -3.5003 -3.5041
1.0994 0.9649 5600 1.0942 -1.7649 -1.9397 0.6255 0.1748 -1.9397 -1.7649 -3.4823 -3.4862
1.1142 0.9821 5700 1.0939 -1.7705 -1.9460 0.6252 0.1755 -1.9460 -1.7705 -3.5005 -3.5042
1.0105 0.9993 5800 1.0934 -1.7804 -1.9571 0.6245 0.1766 -1.9571 -1.7804 -3.4910 -3.4947
1.0585 1.0165 5900 1.0932 -1.7831 -1.9606 0.6231 0.1774 -1.9606 -1.7831 -3.4851 -3.4888
1.05 1.0338 6000 1.0930 -1.7849 -1.9627 0.6231 0.1778 -1.9627 -1.7849 -3.4856 -3.4893
1.1418 1.0510 6100 1.0926 -1.7910 -1.9699 0.625 0.1788 -1.9699 -1.7910 -3.4842 -3.4879
1.052 1.0682 6200 1.0923 -1.7986 -1.9784 0.6229 0.1797 -1.9784 -1.7986 -3.4783 -3.4820
1.0504 1.0855 6300 1.0920 -1.8029 -1.9833 0.6243 0.1804 -1.9833 -1.8029 -3.4718 -3.4755
1.0798 1.1027 6400 1.0920 -1.8055 -1.9863 0.6245 0.1808 -1.9863 -1.8055 -3.4782 -3.4820
1.1707 1.1199 6500 1.0918 -1.8116 -1.9931 0.625 0.1816 -1.9931 -1.8116 -3.4695 -3.4732
1.1428 1.1371 6600 1.0918 -1.8145 -1.9965 0.6248 0.1820 -1.9965 -1.8145 -3.4609 -3.4647
1.0715 1.1544 6700 1.0913 -1.8156 -1.9988 0.6259 0.1832 -1.9988 -1.8156 -3.4882 -3.4918
1.0501 1.1716 6800 1.0911 -1.8232 -2.0069 0.6231 0.1838 -2.0069 -1.8232 -3.4742 -3.4779
1.0595 1.1888 6900 1.0911 -1.8266 -2.0107 0.6252 0.1840 -2.0107 -1.8266 -3.4604 -3.4641
1.0657 1.2061 7000 1.0907 -1.8324 -2.0173 0.6243 0.1850 -2.0173 -1.8324 -3.4681 -3.4718
1.0894 1.2233 7100 1.0908 -1.8311 -2.0162 0.6241 0.1850 -2.0162 -1.8311 -3.4721 -3.4757
1.0263 1.2405 7200 1.0905 -1.8363 -2.0221 0.6248 0.1858 -2.0221 -1.8363 -3.4523 -3.4560
1.0575 1.2578 7300 1.0903 -1.8425 -2.0289 0.6243 0.1864 -2.0289 -1.8425 -3.4530 -3.4567
1.0439 1.2750 7400 1.0898 -1.8475 -2.0349 0.6236 0.1874 -2.0349 -1.8475 -3.4620 -3.4656
1.0479 1.2922 7500 1.0898 -1.8506 -2.0382 0.6248 0.1875 -2.0382 -1.8506 -3.4522 -3.4559
1.0345 1.3094 7600 1.0898 -1.8523 -2.0402 0.6238 0.1878 -2.0402 -1.8523 -3.4562 -3.4598
1.0292 1.3267 7700 1.0895 -1.8566 -2.0451 0.6243 0.1885 -2.0451 -1.8566 -3.4490 -3.4527
1.0667 1.3439 7800 1.0896 -1.8601 -2.0489 0.6243 0.1888 -2.0489 -1.8601 -3.4377 -3.4414
1.0894 1.3611 7900 1.0894 -1.8629 -2.0521 0.6234 0.1893 -2.0521 -1.8629 -3.4502 -3.4538
1.1202 1.3784 8000 1.0893 -1.8667 -2.0563 0.6248 0.1896 -2.0563 -1.8667 -3.4338 -3.4376
1.0709 1.3956 8100 1.0889 -1.8692 -2.0595 0.6243 0.1904 -2.0595 -1.8692 -3.4282 -3.4319
0.9842 1.4128 8200 1.0887 -1.8732 -2.0641 0.6224 0.1910 -2.0641 -1.8732 -3.4388 -3.4425
1.0825 1.4300 8300 1.0888 -1.8771 -2.0681 0.6243 0.1910 -2.0681 -1.8771 -3.4452 -3.4488
1.0353 1.4473 8400 1.0885 -1.8814 -2.0729 0.6248 0.1915 -2.0729 -1.8814 -3.4402 -3.4438
1.0484 1.4645 8500 1.0885 -1.8809 -2.0725 0.6234 0.1917 -2.0725 -1.8809 -3.4378 -3.4415
1.0415 1.4817 8600 1.0886 -1.8835 -2.0753 0.6238 0.1918 -2.0753 -1.8835 -3.4435 -3.4471
1.0403 1.4990 8700 1.0886 -1.8863 -2.0783 0.6224 0.1920 -2.0783 -1.8863 -3.4401 -3.4437
1.0025 1.5162 8800 1.0883 -1.8873 -2.0799 0.6224 0.1926 -2.0799 -1.8873 -3.4421 -3.4457
1.0338 1.5334 8900 1.0881 -1.8921 -2.0852 0.6238 0.1930 -2.0852 -1.8921 -3.4227 -3.4264
1.0588 1.5507 9000 1.0882 -1.8938 -2.0869 0.6222 0.1931 -2.0869 -1.8938 -3.4348 -3.4384
1.0998 1.5679 9100 1.0881 -1.8947 -2.0878 0.6234 0.1932 -2.0878 -1.8947 -3.4355 -3.4391
1.0465 1.5851 9200 1.0881 -1.8949 -2.0881 0.6234 0.1932 -2.0881 -1.8949 -3.4279 -3.4315
1.0754 1.6023 9300 1.0878 -1.8955 -2.0893 0.6234 0.1938 -2.0893 -1.8955 -3.4261 -3.4298
1.0633 1.6196 9400 1.0878 -1.8963 -2.0903 0.6227 0.1940 -2.0903 -1.8963 -3.4275 -3.4312
1.0392 1.6368 9500 1.0881 -1.8982 -2.0917 0.6231 0.1935 -2.0917 -1.8982 -3.4356 -3.4393
1.0565 1.6540 9600 1.0878 -1.8977 -2.0917 0.6231 0.1940 -2.0917 -1.8977 -3.4386 -3.4422
1.0101 1.6713 9700 1.0880 -1.8987 -2.0924 0.6222 0.1937 -2.0924 -1.8987 -3.4357 -3.4393
0.9686 1.6885 9800 1.0879 -1.8992 -2.0933 0.6231 0.1941 -2.0933 -1.8992 -3.4280 -3.4316
0.9781 1.7057 9900 1.0875 -1.8996 -2.0942 0.6229 0.1946 -2.0942 -1.8996 -3.4316 -3.4353
0.9985 1.7229 10000 1.0878 -1.9004 -2.0947 0.6224 0.1942 -2.0947 -1.9004 -3.4334 -3.4370
1.0605 1.7402 10100 1.0879 -1.9007 -2.0946 0.6227 0.1940 -2.0946 -1.9007 -3.4210 -3.4246
1.0453 1.7574 10200 1.0878 -1.9024 -2.0968 0.6224 0.1944 -2.0968 -1.9024 -3.4185 -3.4222
1.0919 1.7746 10300 1.0877 -1.9027 -2.0973 0.6220 0.1947 -2.0973 -1.9027 -3.4347 -3.4383
0.9683 1.7919 10400 1.0877 -1.9023 -2.0968 0.6231 0.1945 -2.0968 -1.9023 -3.4268 -3.4304
1.0501 1.8091 10500 1.0879 -1.9027 -2.0971 0.6227 0.1943 -2.0971 -1.9027 -3.4268 -3.4305
1.0827 1.8263 10600 1.0878 -1.9027 -2.0971 0.6222 0.1944 -2.0971 -1.9027 -3.4260 -3.4297
1.0259 1.8436 10700 1.0878 -1.9030 -2.0976 0.6220 0.1946 -2.0976 -1.9030 -3.4333 -3.4369
0.9896 1.8608 10800 1.0878 -1.9031 -2.0975 0.6229 0.1944 -2.0975 -1.9031 -3.4306 -3.4342
1.0559 1.8780 10900 1.0876 -1.9024 -2.0970 0.6234 0.1947 -2.0970 -1.9024 -3.4247 -3.4283
1.0904 1.8952 11000 1.0878 -1.9029 -2.0975 0.6236 0.1946 -2.0975 -1.9029 -3.4325 -3.4361
1.0518 1.9125 11100 1.0877 -1.9027 -2.0973 0.6234 0.1946 -2.0973 -1.9027 -3.4235 -3.4272
1.0111 1.9297 11200 1.0878 -1.9032 -2.0976 0.6231 0.1943 -2.0976 -1.9032 -3.4197 -3.4233
1.1208 1.9469 11300 1.0877 -1.9032 -2.0979 0.6236 0.1947 -2.0979 -1.9032 -3.4274 -3.4310
1.0322 1.9642 11400 1.0878 -1.9033 -2.0977 0.6231 0.1944 -2.0977 -1.9033 -3.4257 -3.4293
1.0917 1.9814 11500 1.0878 -1.9033 -2.0977 0.6234 0.1944 -2.0977 -1.9033 -3.4251 -3.4287
1.0116 1.9986 11600 1.0879 -1.9033 -2.0977 0.6229 0.1944 -2.0977 -1.9033 -3.4251 -3.4288

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1