tinyllama-1.1b-sum-simpo_beta1.0_gamma0.8_LR5e-8_3epochs
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0879
- Rewards/chosen: -1.9033
- Rewards/rejected: -2.0977
- Rewards/accuracies: 0.6229
- Rewards/margins: 0.1944
- Logps/rejected: -2.0977
- Logps/chosen: -1.9033
- Logits/rejected: -3.4251
- Logits/chosen: -3.4288
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.1055 | 0.0172 | 100 | 1.1173 | -1.4808 | -1.6083 | 0.5955 | 0.1275 | -1.6083 | -1.4808 | -3.6326 | -3.6367 |
1.1204 | 0.0345 | 200 | 1.1173 | -1.4808 | -1.6083 | 0.5948 | 0.1275 | -1.6083 | -1.4808 | -3.6390 | -3.6431 |
1.0875 | 0.0517 | 300 | 1.1174 | -1.4810 | -1.6084 | 0.5955 | 0.1273 | -1.6084 | -1.4810 | -3.6290 | -3.6332 |
1.1119 | 0.0689 | 400 | 1.1173 | -1.4808 | -1.6082 | 0.5943 | 0.1274 | -1.6082 | -1.4808 | -3.6564 | -3.6603 |
1.1439 | 0.0861 | 500 | 1.1174 | -1.4810 | -1.6083 | 0.5955 | 0.1273 | -1.6083 | -1.4810 | -3.6435 | -3.6476 |
1.0556 | 0.1034 | 600 | 1.1175 | -1.4811 | -1.6082 | 0.5941 | 0.1271 | -1.6082 | -1.4811 | -3.6387 | -3.6428 |
1.1686 | 0.1206 | 700 | 1.1172 | -1.4806 | -1.6083 | 0.5946 | 0.1277 | -1.6083 | -1.4806 | -3.6496 | -3.6536 |
1.1046 | 0.1378 | 800 | 1.1172 | -1.4807 | -1.6084 | 0.5941 | 0.1277 | -1.6084 | -1.4807 | -3.6461 | -3.6501 |
1.1817 | 0.1551 | 900 | 1.1169 | -1.4805 | -1.6086 | 0.5950 | 0.1281 | -1.6086 | -1.4805 | -3.6312 | -3.6353 |
1.1844 | 0.1723 | 1000 | 1.1170 | -1.4806 | -1.6086 | 0.5967 | 0.1280 | -1.6086 | -1.4806 | -3.6534 | -3.6574 |
1.1161 | 0.1895 | 1100 | 1.1168 | -1.4805 | -1.6086 | 0.5962 | 0.1282 | -1.6086 | -1.4805 | -3.6328 | -3.6369 |
1.1305 | 0.2068 | 1200 | 1.1165 | -1.4802 | -1.6089 | 0.5957 | 0.1287 | -1.6089 | -1.4802 | -3.6268 | -3.6309 |
1.0977 | 0.2240 | 1300 | 1.1163 | -1.4801 | -1.6092 | 0.5967 | 0.1291 | -1.6092 | -1.4801 | -3.6315 | -3.6356 |
1.1432 | 0.2412 | 1400 | 1.1161 | -1.4805 | -1.6099 | 0.5960 | 0.1295 | -1.6099 | -1.4805 | -3.6389 | -3.6429 |
1.1427 | 0.2584 | 1500 | 1.1160 | -1.4809 | -1.6106 | 0.5955 | 0.1297 | -1.6106 | -1.4809 | -3.6223 | -3.6264 |
1.1065 | 0.2757 | 1600 | 1.1155 | -1.4808 | -1.6113 | 0.5939 | 0.1305 | -1.6113 | -1.4808 | -3.6324 | -3.6364 |
1.1183 | 0.2929 | 1700 | 1.1153 | -1.4820 | -1.6129 | 0.5962 | 0.1309 | -1.6129 | -1.4820 | -3.6176 | -3.6217 |
1.0866 | 0.3101 | 1800 | 1.1149 | -1.4823 | -1.6138 | 0.5971 | 0.1315 | -1.6138 | -1.4823 | -3.6203 | -3.6243 |
1.1745 | 0.3274 | 1900 | 1.1147 | -1.4835 | -1.6155 | 0.6001 | 0.1320 | -1.6155 | -1.4835 | -3.6214 | -3.6255 |
1.1004 | 0.3446 | 2000 | 1.1142 | -1.4847 | -1.6175 | 0.6004 | 0.1328 | -1.6175 | -1.4847 | -3.6214 | -3.6254 |
1.1671 | 0.3618 | 2100 | 1.1139 | -1.4861 | -1.6194 | 0.6034 | 0.1333 | -1.6194 | -1.4861 | -3.6193 | -3.6233 |
1.0747 | 0.3790 | 2200 | 1.1135 | -1.4871 | -1.6211 | 0.6020 | 0.1340 | -1.6211 | -1.4871 | -3.5959 | -3.6000 |
1.1048 | 0.3963 | 2300 | 1.1131 | -1.4895 | -1.6242 | 0.6050 | 0.1347 | -1.6242 | -1.4895 | -3.6181 | -3.6220 |
1.0478 | 0.4135 | 2400 | 1.1126 | -1.4916 | -1.6271 | 0.6059 | 0.1355 | -1.6271 | -1.4916 | -3.6027 | -3.6067 |
1.1417 | 0.4307 | 2500 | 1.1120 | -1.4940 | -1.6306 | 0.6064 | 0.1366 | -1.6306 | -1.4940 | -3.6005 | -3.6044 |
1.1261 | 0.4480 | 2600 | 1.1116 | -1.4970 | -1.6342 | 0.6078 | 0.1373 | -1.6342 | -1.4970 | -3.5878 | -3.5918 |
1.0752 | 0.4652 | 2700 | 1.1109 | -1.5008 | -1.6394 | 0.6127 | 0.1386 | -1.6394 | -1.5008 | -3.5903 | -3.5943 |
1.1623 | 0.4824 | 2800 | 1.1105 | -1.5047 | -1.6440 | 0.6113 | 0.1393 | -1.6440 | -1.5047 | -3.6001 | -3.6040 |
1.1744 | 0.4997 | 2900 | 1.1100 | -1.5102 | -1.6505 | 0.6129 | 0.1403 | -1.6505 | -1.5102 | -3.5956 | -3.5995 |
1.1373 | 0.5169 | 3000 | 1.1094 | -1.5156 | -1.6570 | 0.6143 | 0.1414 | -1.6570 | -1.5156 | -3.5820 | -3.5859 |
1.0913 | 0.5341 | 3100 | 1.1089 | -1.5184 | -1.6608 | 0.6155 | 0.1423 | -1.6608 | -1.5184 | -3.5832 | -3.5872 |
1.1645 | 0.5513 | 3200 | 1.1084 | -1.5258 | -1.6691 | 0.6090 | 0.1433 | -1.6691 | -1.5258 | -3.5695 | -3.5734 |
1.1238 | 0.5686 | 3300 | 1.1078 | -1.5336 | -1.6783 | 0.6080 | 0.1447 | -1.6783 | -1.5336 | -3.5825 | -3.5864 |
1.0437 | 0.5858 | 3400 | 1.1070 | -1.5437 | -1.6900 | 0.6080 | 0.1463 | -1.6900 | -1.5437 | -3.5807 | -3.5846 |
1.099 | 0.6030 | 3500 | 1.1067 | -1.5524 | -1.6996 | 0.6106 | 0.1472 | -1.6996 | -1.5524 | -3.5762 | -3.5801 |
1.1365 | 0.6203 | 3600 | 1.1062 | -1.5626 | -1.7112 | 0.6099 | 0.1486 | -1.7112 | -1.5626 | -3.5711 | -3.5750 |
1.0205 | 0.6375 | 3700 | 1.1058 | -1.5728 | -1.7227 | 0.6094 | 0.1499 | -1.7227 | -1.5728 | -3.5510 | -3.5549 |
1.1328 | 0.6547 | 3800 | 1.1049 | -1.5860 | -1.7379 | 0.6127 | 0.1518 | -1.7379 | -1.5860 | -3.5589 | -3.5628 |
1.0318 | 0.6720 | 3900 | 1.1039 | -1.5995 | -1.7533 | 0.6127 | 0.1538 | -1.7533 | -1.5995 | -3.5582 | -3.5620 |
1.1154 | 0.6892 | 4000 | 1.1030 | -1.6156 | -1.7712 | 0.6166 | 0.1556 | -1.7712 | -1.6156 | -3.5573 | -3.5611 |
1.0646 | 0.7064 | 4100 | 1.1023 | -1.6234 | -1.7804 | 0.6178 | 0.1570 | -1.7804 | -1.6234 | -3.5444 | -3.5483 |
1.1369 | 0.7236 | 4200 | 1.1017 | -1.6360 | -1.7944 | 0.6171 | 0.1584 | -1.7944 | -1.6360 | -3.5433 | -3.5471 |
1.0954 | 0.7409 | 4300 | 1.1013 | -1.6440 | -1.8033 | 0.6183 | 0.1592 | -1.8033 | -1.6440 | -3.5205 | -3.5244 |
1.1088 | 0.7581 | 4400 | 1.1008 | -1.6539 | -1.8143 | 0.6176 | 0.1604 | -1.8143 | -1.6539 | -3.5270 | -3.5309 |
1.1572 | 0.7753 | 4500 | 1.0999 | -1.6681 | -1.8301 | 0.6206 | 0.1620 | -1.8301 | -1.6681 | -3.5356 | -3.5394 |
1.0346 | 0.7926 | 4600 | 1.0990 | -1.6779 | -1.8419 | 0.6241 | 0.1639 | -1.8419 | -1.6779 | -3.5304 | -3.5342 |
1.0589 | 0.8098 | 4700 | 1.0985 | -1.6892 | -1.8544 | 0.6248 | 0.1652 | -1.8544 | -1.6892 | -3.5181 | -3.5220 |
1.1169 | 0.8270 | 4800 | 1.0978 | -1.7043 | -1.8709 | 0.625 | 0.1665 | -1.8709 | -1.7043 | -3.5202 | -3.5240 |
1.0477 | 0.8442 | 4900 | 1.0972 | -1.7175 | -1.8854 | 0.6259 | 0.1679 | -1.8854 | -1.7175 | -3.5196 | -3.5234 |
1.1388 | 0.8615 | 5000 | 1.0969 | -1.7191 | -1.8875 | 0.6241 | 0.1684 | -1.8875 | -1.7191 | -3.5124 | -3.5162 |
1.0556 | 0.8787 | 5100 | 1.0962 | -1.7341 | -1.9040 | 0.6236 | 0.1699 | -1.9040 | -1.7341 | -3.5062 | -3.5100 |
1.0387 | 0.8959 | 5200 | 1.0953 | -1.7483 | -1.9201 | 0.6241 | 0.1718 | -1.9201 | -1.7483 | -3.5064 | -3.5102 |
1.066 | 0.9132 | 5300 | 1.0952 | -1.7533 | -1.9256 | 0.6241 | 0.1723 | -1.9256 | -1.7533 | -3.5057 | -3.5094 |
1.0191 | 0.9304 | 5400 | 1.0946 | -1.7615 | -1.9351 | 0.6259 | 0.1735 | -1.9351 | -1.7615 | -3.4954 | -3.4992 |
1.0353 | 0.9476 | 5500 | 1.0947 | -1.7636 | -1.9374 | 0.625 | 0.1737 | -1.9374 | -1.7636 | -3.5003 | -3.5041 |
1.0994 | 0.9649 | 5600 | 1.0942 | -1.7649 | -1.9397 | 0.6255 | 0.1748 | -1.9397 | -1.7649 | -3.4823 | -3.4862 |
1.1142 | 0.9821 | 5700 | 1.0939 | -1.7705 | -1.9460 | 0.6252 | 0.1755 | -1.9460 | -1.7705 | -3.5005 | -3.5042 |
1.0105 | 0.9993 | 5800 | 1.0934 | -1.7804 | -1.9571 | 0.6245 | 0.1766 | -1.9571 | -1.7804 | -3.4910 | -3.4947 |
1.0585 | 1.0165 | 5900 | 1.0932 | -1.7831 | -1.9606 | 0.6231 | 0.1774 | -1.9606 | -1.7831 | -3.4851 | -3.4888 |
1.05 | 1.0338 | 6000 | 1.0930 | -1.7849 | -1.9627 | 0.6231 | 0.1778 | -1.9627 | -1.7849 | -3.4856 | -3.4893 |
1.1418 | 1.0510 | 6100 | 1.0926 | -1.7910 | -1.9699 | 0.625 | 0.1788 | -1.9699 | -1.7910 | -3.4842 | -3.4879 |
1.052 | 1.0682 | 6200 | 1.0923 | -1.7986 | -1.9784 | 0.6229 | 0.1797 | -1.9784 | -1.7986 | -3.4783 | -3.4820 |
1.0504 | 1.0855 | 6300 | 1.0920 | -1.8029 | -1.9833 | 0.6243 | 0.1804 | -1.9833 | -1.8029 | -3.4718 | -3.4755 |
1.0798 | 1.1027 | 6400 | 1.0920 | -1.8055 | -1.9863 | 0.6245 | 0.1808 | -1.9863 | -1.8055 | -3.4782 | -3.4820 |
1.1707 | 1.1199 | 6500 | 1.0918 | -1.8116 | -1.9931 | 0.625 | 0.1816 | -1.9931 | -1.8116 | -3.4695 | -3.4732 |
1.1428 | 1.1371 | 6600 | 1.0918 | -1.8145 | -1.9965 | 0.6248 | 0.1820 | -1.9965 | -1.8145 | -3.4609 | -3.4647 |
1.0715 | 1.1544 | 6700 | 1.0913 | -1.8156 | -1.9988 | 0.6259 | 0.1832 | -1.9988 | -1.8156 | -3.4882 | -3.4918 |
1.0501 | 1.1716 | 6800 | 1.0911 | -1.8232 | -2.0069 | 0.6231 | 0.1838 | -2.0069 | -1.8232 | -3.4742 | -3.4779 |
1.0595 | 1.1888 | 6900 | 1.0911 | -1.8266 | -2.0107 | 0.6252 | 0.1840 | -2.0107 | -1.8266 | -3.4604 | -3.4641 |
1.0657 | 1.2061 | 7000 | 1.0907 | -1.8324 | -2.0173 | 0.6243 | 0.1850 | -2.0173 | -1.8324 | -3.4681 | -3.4718 |
1.0894 | 1.2233 | 7100 | 1.0908 | -1.8311 | -2.0162 | 0.6241 | 0.1850 | -2.0162 | -1.8311 | -3.4721 | -3.4757 |
1.0263 | 1.2405 | 7200 | 1.0905 | -1.8363 | -2.0221 | 0.6248 | 0.1858 | -2.0221 | -1.8363 | -3.4523 | -3.4560 |
1.0575 | 1.2578 | 7300 | 1.0903 | -1.8425 | -2.0289 | 0.6243 | 0.1864 | -2.0289 | -1.8425 | -3.4530 | -3.4567 |
1.0439 | 1.2750 | 7400 | 1.0898 | -1.8475 | -2.0349 | 0.6236 | 0.1874 | -2.0349 | -1.8475 | -3.4620 | -3.4656 |
1.0479 | 1.2922 | 7500 | 1.0898 | -1.8506 | -2.0382 | 0.6248 | 0.1875 | -2.0382 | -1.8506 | -3.4522 | -3.4559 |
1.0345 | 1.3094 | 7600 | 1.0898 | -1.8523 | -2.0402 | 0.6238 | 0.1878 | -2.0402 | -1.8523 | -3.4562 | -3.4598 |
1.0292 | 1.3267 | 7700 | 1.0895 | -1.8566 | -2.0451 | 0.6243 | 0.1885 | -2.0451 | -1.8566 | -3.4490 | -3.4527 |
1.0667 | 1.3439 | 7800 | 1.0896 | -1.8601 | -2.0489 | 0.6243 | 0.1888 | -2.0489 | -1.8601 | -3.4377 | -3.4414 |
1.0894 | 1.3611 | 7900 | 1.0894 | -1.8629 | -2.0521 | 0.6234 | 0.1893 | -2.0521 | -1.8629 | -3.4502 | -3.4538 |
1.1202 | 1.3784 | 8000 | 1.0893 | -1.8667 | -2.0563 | 0.6248 | 0.1896 | -2.0563 | -1.8667 | -3.4338 | -3.4376 |
1.0709 | 1.3956 | 8100 | 1.0889 | -1.8692 | -2.0595 | 0.6243 | 0.1904 | -2.0595 | -1.8692 | -3.4282 | -3.4319 |
0.9842 | 1.4128 | 8200 | 1.0887 | -1.8732 | -2.0641 | 0.6224 | 0.1910 | -2.0641 | -1.8732 | -3.4388 | -3.4425 |
1.0825 | 1.4300 | 8300 | 1.0888 | -1.8771 | -2.0681 | 0.6243 | 0.1910 | -2.0681 | -1.8771 | -3.4452 | -3.4488 |
1.0353 | 1.4473 | 8400 | 1.0885 | -1.8814 | -2.0729 | 0.6248 | 0.1915 | -2.0729 | -1.8814 | -3.4402 | -3.4438 |
1.0484 | 1.4645 | 8500 | 1.0885 | -1.8809 | -2.0725 | 0.6234 | 0.1917 | -2.0725 | -1.8809 | -3.4378 | -3.4415 |
1.0415 | 1.4817 | 8600 | 1.0886 | -1.8835 | -2.0753 | 0.6238 | 0.1918 | -2.0753 | -1.8835 | -3.4435 | -3.4471 |
1.0403 | 1.4990 | 8700 | 1.0886 | -1.8863 | -2.0783 | 0.6224 | 0.1920 | -2.0783 | -1.8863 | -3.4401 | -3.4437 |
1.0025 | 1.5162 | 8800 | 1.0883 | -1.8873 | -2.0799 | 0.6224 | 0.1926 | -2.0799 | -1.8873 | -3.4421 | -3.4457 |
1.0338 | 1.5334 | 8900 | 1.0881 | -1.8921 | -2.0852 | 0.6238 | 0.1930 | -2.0852 | -1.8921 | -3.4227 | -3.4264 |
1.0588 | 1.5507 | 9000 | 1.0882 | -1.8938 | -2.0869 | 0.6222 | 0.1931 | -2.0869 | -1.8938 | -3.4348 | -3.4384 |
1.0998 | 1.5679 | 9100 | 1.0881 | -1.8947 | -2.0878 | 0.6234 | 0.1932 | -2.0878 | -1.8947 | -3.4355 | -3.4391 |
1.0465 | 1.5851 | 9200 | 1.0881 | -1.8949 | -2.0881 | 0.6234 | 0.1932 | -2.0881 | -1.8949 | -3.4279 | -3.4315 |
1.0754 | 1.6023 | 9300 | 1.0878 | -1.8955 | -2.0893 | 0.6234 | 0.1938 | -2.0893 | -1.8955 | -3.4261 | -3.4298 |
1.0633 | 1.6196 | 9400 | 1.0878 | -1.8963 | -2.0903 | 0.6227 | 0.1940 | -2.0903 | -1.8963 | -3.4275 | -3.4312 |
1.0392 | 1.6368 | 9500 | 1.0881 | -1.8982 | -2.0917 | 0.6231 | 0.1935 | -2.0917 | -1.8982 | -3.4356 | -3.4393 |
1.0565 | 1.6540 | 9600 | 1.0878 | -1.8977 | -2.0917 | 0.6231 | 0.1940 | -2.0917 | -1.8977 | -3.4386 | -3.4422 |
1.0101 | 1.6713 | 9700 | 1.0880 | -1.8987 | -2.0924 | 0.6222 | 0.1937 | -2.0924 | -1.8987 | -3.4357 | -3.4393 |
0.9686 | 1.6885 | 9800 | 1.0879 | -1.8992 | -2.0933 | 0.6231 | 0.1941 | -2.0933 | -1.8992 | -3.4280 | -3.4316 |
0.9781 | 1.7057 | 9900 | 1.0875 | -1.8996 | -2.0942 | 0.6229 | 0.1946 | -2.0942 | -1.8996 | -3.4316 | -3.4353 |
0.9985 | 1.7229 | 10000 | 1.0878 | -1.9004 | -2.0947 | 0.6224 | 0.1942 | -2.0947 | -1.9004 | -3.4334 | -3.4370 |
1.0605 | 1.7402 | 10100 | 1.0879 | -1.9007 | -2.0946 | 0.6227 | 0.1940 | -2.0946 | -1.9007 | -3.4210 | -3.4246 |
1.0453 | 1.7574 | 10200 | 1.0878 | -1.9024 | -2.0968 | 0.6224 | 0.1944 | -2.0968 | -1.9024 | -3.4185 | -3.4222 |
1.0919 | 1.7746 | 10300 | 1.0877 | -1.9027 | -2.0973 | 0.6220 | 0.1947 | -2.0973 | -1.9027 | -3.4347 | -3.4383 |
0.9683 | 1.7919 | 10400 | 1.0877 | -1.9023 | -2.0968 | 0.6231 | 0.1945 | -2.0968 | -1.9023 | -3.4268 | -3.4304 |
1.0501 | 1.8091 | 10500 | 1.0879 | -1.9027 | -2.0971 | 0.6227 | 0.1943 | -2.0971 | -1.9027 | -3.4268 | -3.4305 |
1.0827 | 1.8263 | 10600 | 1.0878 | -1.9027 | -2.0971 | 0.6222 | 0.1944 | -2.0971 | -1.9027 | -3.4260 | -3.4297 |
1.0259 | 1.8436 | 10700 | 1.0878 | -1.9030 | -2.0976 | 0.6220 | 0.1946 | -2.0976 | -1.9030 | -3.4333 | -3.4369 |
0.9896 | 1.8608 | 10800 | 1.0878 | -1.9031 | -2.0975 | 0.6229 | 0.1944 | -2.0975 | -1.9031 | -3.4306 | -3.4342 |
1.0559 | 1.8780 | 10900 | 1.0876 | -1.9024 | -2.0970 | 0.6234 | 0.1947 | -2.0970 | -1.9024 | -3.4247 | -3.4283 |
1.0904 | 1.8952 | 11000 | 1.0878 | -1.9029 | -2.0975 | 0.6236 | 0.1946 | -2.0975 | -1.9029 | -3.4325 | -3.4361 |
1.0518 | 1.9125 | 11100 | 1.0877 | -1.9027 | -2.0973 | 0.6234 | 0.1946 | -2.0973 | -1.9027 | -3.4235 | -3.4272 |
1.0111 | 1.9297 | 11200 | 1.0878 | -1.9032 | -2.0976 | 0.6231 | 0.1943 | -2.0976 | -1.9032 | -3.4197 | -3.4233 |
1.1208 | 1.9469 | 11300 | 1.0877 | -1.9032 | -2.0979 | 0.6236 | 0.1947 | -2.0979 | -1.9032 | -3.4274 | -3.4310 |
1.0322 | 1.9642 | 11400 | 1.0878 | -1.9033 | -2.0977 | 0.6231 | 0.1944 | -2.0977 | -1.9033 | -3.4257 | -3.4293 |
1.0917 | 1.9814 | 11500 | 1.0878 | -1.9033 | -2.0977 | 0.6234 | 0.1944 | -2.0977 | -1.9033 | -3.4251 | -3.4287 |
1.0116 | 1.9986 | 11600 | 1.0879 | -1.9033 | -2.0977 | 0.6229 | 0.1944 | -2.0977 | -1.9033 | -3.4251 | -3.4288 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.