smollm-1.7b-instruct-simpo-v2
This model is a fine-tuned version of HuggingFaceTB/SmolLM-1.7B-Instruct on the BAAI/Infinity-Preference dataset. It achieves the following results on the evaluation set:
- Loss: 3.0877
- Rewards/chosen: -22.8949
- Rewards/rejected: -24.4444
- Rewards/accuracies: 0.6300
- Rewards/margins: 1.5495
- Logps/rejected: -2.4444
- Logps/chosen: -2.2895
- Logits/rejected: -2.4913
- Logits/chosen: -2.3131
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
3.2871 | 0.0135 | 400 | 3.4379 | -16.5537 | -16.5135 | 0.4700 | -0.0402 | -1.6513 | -1.6554 | -0.7019 | -0.7007 |
3.4746 | 0.0270 | 800 | 3.4370 | -16.5561 | -16.5146 | 0.4700 | -0.0415 | -1.6515 | -1.6556 | -0.7002 | -0.6988 |
2.8856 | 0.0404 | 1200 | 3.4399 | -16.5623 | -16.5160 | 0.4700 | -0.0464 | -1.6516 | -1.6562 | -0.6997 | -0.6984 |
3.8819 | 0.0539 | 1600 | 3.4374 | -16.5639 | -16.5248 | 0.4700 | -0.0391 | -1.6525 | -1.6564 | -0.7012 | -0.6998 |
3.622 | 0.0674 | 2000 | 3.4319 | -16.5838 | -16.5551 | 0.4700 | -0.0288 | -1.6555 | -1.6584 | -0.7089 | -0.7069 |
3.6924 | 0.0809 | 2400 | 3.4273 | -16.6109 | -16.5901 | 0.4700 | -0.0208 | -1.6590 | -1.6611 | -0.7032 | -0.7007 |
3.0591 | 0.0944 | 2800 | 3.4161 | -16.6863 | -16.6979 | 0.4600 | 0.0117 | -1.6698 | -1.6686 | -0.7295 | -0.7253 |
3.4937 | 0.1079 | 3200 | 3.4013 | -16.7982 | -16.8590 | 0.4700 | 0.0608 | -1.6859 | -1.6798 | -0.7483 | -0.7412 |
3.1565 | 0.1213 | 3600 | 3.3852 | -16.8542 | -16.9385 | 0.4700 | 0.0843 | -1.6939 | -1.6854 | -0.7618 | -0.7526 |
2.7504 | 0.1348 | 4000 | 3.3711 | -16.9128 | -17.0175 | 0.4800 | 0.1047 | -1.7018 | -1.6913 | -0.7684 | -0.7574 |
3.0312 | 0.1483 | 4400 | 3.3606 | -16.9720 | -17.0910 | 0.4900 | 0.1190 | -1.7091 | -1.6972 | -0.7754 | -0.7629 |
4.145 | 0.1618 | 4800 | 3.3407 | -17.0816 | -17.2375 | 0.5100 | 0.1559 | -1.7238 | -1.7082 | -0.7902 | -0.7746 |
3.9514 | 0.1753 | 5200 | 3.3126 | -17.1952 | -17.3924 | 0.5100 | 0.1972 | -1.7392 | -1.7195 | -0.8201 | -0.8001 |
2.4942 | 0.1887 | 5600 | 3.2864 | -17.2731 | -17.4955 | 0.5100 | 0.2223 | -1.7495 | -1.7273 | -0.8187 | -0.7960 |
2.6757 | 0.2022 | 6000 | 3.2615 | -17.3603 | -17.6063 | 0.5200 | 0.2460 | -1.7606 | -1.7360 | -0.7977 | -0.7735 |
2.8576 | 0.2157 | 6400 | 3.2382 | -17.5060 | -17.8132 | 0.5500 | 0.3072 | -1.7813 | -1.7506 | -0.8562 | -0.8260 |
3.7483 | 0.2292 | 6800 | 3.2140 | -17.5965 | -17.9376 | 0.5700 | 0.3411 | -1.7938 | -1.7596 | -0.8751 | -0.8407 |
3.5349 | 0.2427 | 7200 | 3.2035 | -17.6663 | -18.0193 | 0.5800 | 0.3530 | -1.8019 | -1.7666 | -0.8780 | -0.8417 |
2.0604 | 0.2562 | 7600 | 3.1925 | -17.7393 | -18.1045 | 0.6100 | 0.3652 | -1.8104 | -1.7739 | -0.9017 | -0.8602 |
5.7031 | 0.2696 | 8000 | 3.1672 | -18.0175 | -18.4936 | 0.6100 | 0.4760 | -1.8494 | -1.8018 | -0.9982 | -0.9467 |
2.6005 | 0.2831 | 8400 | 3.1475 | -18.1162 | -18.6283 | 0.6100 | 0.5121 | -1.8628 | -1.8116 | -1.0732 | -1.0161 |
1.9787 | 0.2966 | 8800 | 3.1226 | -18.3260 | -18.9198 | 0.6100 | 0.5938 | -1.8920 | -1.8326 | -1.1691 | -1.1062 |
2.8347 | 0.3101 | 9200 | 3.1156 | -18.4632 | -19.0934 | 0.6100 | 0.6301 | -1.9093 | -1.8463 | -1.2592 | -1.1910 |
2.701 | 0.3236 | 9600 | 3.1022 | -18.5083 | -19.1346 | 0.6100 | 0.6264 | -1.9135 | -1.8508 | -1.2785 | -1.2073 |
3.772 | 0.3371 | 10000 | 3.0772 | -18.5843 | -19.2491 | 0.6100 | 0.6649 | -1.9249 | -1.8584 | -1.3345 | -1.2587 |
2.7414 | 0.3505 | 10400 | 3.0551 | -18.8305 | -19.5946 | 0.6100 | 0.7641 | -1.9595 | -1.8830 | -1.3824 | -1.3004 |
2.0287 | 0.3640 | 10800 | 3.0534 | -18.9934 | -19.7985 | 0.6200 | 0.8051 | -1.9798 | -1.8993 | -1.4355 | -1.3467 |
1.0473 | 0.3775 | 11200 | 3.0528 | -19.1581 | -19.9858 | 0.6100 | 0.8277 | -1.9986 | -1.9158 | -1.5109 | -1.4173 |
2.8106 | 0.3910 | 11600 | 3.0436 | -19.1763 | -19.9989 | 0.6100 | 0.8226 | -1.9999 | -1.9176 | -1.5138 | -1.4206 |
3.0344 | 0.4045 | 12000 | 3.0333 | -19.2526 | -20.1079 | 0.6100 | 0.8553 | -2.0108 | -1.9253 | -1.5628 | -1.4657 |
2.1886 | 0.4179 | 12400 | 3.0187 | -19.4500 | -20.3818 | 0.6300 | 0.9318 | -2.0382 | -1.9450 | -1.6246 | -1.5217 |
4.1181 | 0.4314 | 12800 | 3.0086 | -19.6204 | -20.6104 | 0.6300 | 0.9900 | -2.0610 | -1.9620 | -1.6886 | -1.5818 |
1.6647 | 0.4449 | 13200 | 3.0126 | -19.7773 | -20.7949 | 0.6300 | 1.0176 | -2.0795 | -1.9777 | -1.7307 | -1.6181 |
4.8533 | 0.4584 | 13600 | 3.0012 | -19.9001 | -20.9633 | 0.6300 | 1.0632 | -2.0963 | -1.9900 | -1.7437 | -1.6288 |
2.9945 | 0.4719 | 14000 | 3.0071 | -19.9831 | -21.0361 | 0.6300 | 1.0529 | -2.1036 | -1.9983 | -1.7839 | -1.6667 |
2.9377 | 0.4854 | 14400 | 2.9946 | -20.1165 | -21.2172 | 0.6400 | 1.1007 | -2.1217 | -2.0117 | -1.8386 | -1.7178 |
2.7856 | 0.4988 | 14800 | 2.9908 | -20.2830 | -21.4151 | 0.6300 | 1.1322 | -2.1415 | -2.0283 | -1.8720 | -1.7468 |
4.9446 | 0.5123 | 15200 | 2.9905 | -20.4144 | -21.5669 | 0.6300 | 1.1525 | -2.1567 | -2.0414 | -1.9057 | -1.7760 |
3.2834 | 0.5258 | 15600 | 2.9858 | -20.4428 | -21.5993 | 0.6300 | 1.1565 | -2.1599 | -2.0443 | -1.8928 | -1.7633 |
1.8705 | 0.5393 | 16000 | 2.9888 | -20.5922 | -21.7774 | 0.6300 | 1.1853 | -2.1777 | -2.0592 | -1.9340 | -1.8009 |
4.0587 | 0.5528 | 16400 | 2.9925 | -20.8812 | -22.1359 | 0.6300 | 1.2547 | -2.2136 | -2.0881 | -2.0019 | -1.8627 |
3.0706 | 0.5662 | 16800 | 2.9946 | -21.1005 | -22.4176 | 0.6300 | 1.3171 | -2.2418 | -2.1101 | -2.0533 | -1.9104 |
3.152 | 0.5797 | 17200 | 2.9916 | -21.2937 | -22.6723 | 0.6200 | 1.3786 | -2.2672 | -2.1294 | -2.1094 | -1.9627 |
1.8856 | 0.5932 | 17600 | 2.9847 | -21.2727 | -22.6463 | 0.6200 | 1.3736 | -2.2646 | -2.1273 | -2.1108 | -1.9637 |
1.1291 | 0.6067 | 18000 | 2.9981 | -21.5313 | -22.9507 | 0.6200 | 1.4194 | -2.2951 | -2.1531 | -2.1736 | -2.0212 |
2.9894 | 0.6202 | 18400 | 3.0033 | -21.6191 | -23.0276 | 0.6200 | 1.4085 | -2.3028 | -2.1619 | -2.2089 | -2.0543 |
3.497 | 0.6337 | 18800 | 3.0252 | -21.8198 | -23.2426 | 0.6200 | 1.4228 | -2.3243 | -2.1820 | -2.2285 | -2.0714 |
3.18 | 0.6471 | 19200 | 3.0307 | -21.8887 | -23.3005 | 0.6200 | 1.4117 | -2.3300 | -2.1889 | -2.2462 | -2.0862 |
1.9522 | 0.6606 | 19600 | 3.0391 | -21.9179 | -23.3214 | 0.6300 | 1.4035 | -2.3321 | -2.1918 | -2.2476 | -2.0875 |
2.4878 | 0.6741 | 20000 | 3.0431 | -22.1021 | -23.5543 | 0.6300 | 1.4522 | -2.3554 | -2.2102 | -2.2969 | -2.1333 |
2.3506 | 0.6876 | 20400 | 3.0453 | -22.2379 | -23.7220 | 0.6300 | 1.4841 | -2.3722 | -2.2238 | -2.3258 | -2.1603 |
3.9719 | 0.7011 | 20800 | 3.0591 | -22.2718 | -23.7317 | 0.6300 | 1.4599 | -2.3732 | -2.2272 | -2.3263 | -2.1600 |
1.4942 | 0.7146 | 21200 | 3.0574 | -22.3226 | -23.8044 | 0.6300 | 1.4819 | -2.3804 | -2.2323 | -2.3352 | -2.1680 |
0.8797 | 0.7280 | 21600 | 3.0616 | -22.3419 | -23.8235 | 0.6300 | 1.4816 | -2.3823 | -2.2342 | -2.3394 | -2.1721 |
2.8176 | 0.7415 | 22000 | 3.0751 | -22.4788 | -23.9643 | 0.6300 | 1.4855 | -2.3964 | -2.2479 | -2.3767 | -2.2073 |
3.3744 | 0.7550 | 22400 | 3.0775 | -22.6028 | -24.1137 | 0.6300 | 1.5109 | -2.4114 | -2.2603 | -2.4146 | -2.2423 |
1.9708 | 0.7685 | 22800 | 3.0768 | -22.6249 | -24.1479 | 0.6300 | 1.5231 | -2.4148 | -2.2625 | -2.4216 | -2.2482 |
2.1589 | 0.7820 | 23200 | 3.0697 | -22.6570 | -24.1936 | 0.6300 | 1.5367 | -2.4194 | -2.2657 | -2.4323 | -2.2591 |
3.0872 | 0.7954 | 23600 | 3.0813 | -22.7174 | -24.2489 | 0.6300 | 1.5315 | -2.4249 | -2.2717 | -2.4430 | -2.2683 |
3.9705 | 0.8089 | 24000 | 3.0806 | -22.7644 | -24.3076 | 0.6300 | 1.5432 | -2.4308 | -2.2764 | -2.4598 | -2.2840 |
3.5691 | 0.8224 | 24400 | 3.0807 | -22.7627 | -24.2931 | 0.6300 | 1.5304 | -2.4293 | -2.2763 | -2.4621 | -2.2857 |
1.4467 | 0.8359 | 24800 | 3.0854 | -22.8132 | -24.3525 | 0.6300 | 1.5393 | -2.4353 | -2.2813 | -2.4742 | -2.2963 |
2.7241 | 0.8494 | 25200 | 3.0862 | -22.8300 | -24.3745 | 0.6300 | 1.5445 | -2.4375 | -2.2830 | -2.4770 | -2.2988 |
2.7441 | 0.8629 | 25600 | 3.0866 | -22.8450 | -24.3876 | 0.6300 | 1.5427 | -2.4388 | -2.2845 | -2.4823 | -2.3048 |
1.4801 | 0.8763 | 26000 | 3.0839 | -22.8522 | -24.4010 | 0.6300 | 1.5488 | -2.4401 | -2.2852 | -2.4827 | -2.3057 |
2.5965 | 0.8898 | 26400 | 3.0841 | -22.8629 | -24.4169 | 0.6300 | 1.5540 | -2.4417 | -2.2863 | -2.4877 | -2.3095 |
3.6415 | 0.9033 | 26800 | 3.0893 | -22.8830 | -24.4340 | 0.6300 | 1.5510 | -2.4434 | -2.2883 | -2.4894 | -2.3114 |
2.0584 | 0.9168 | 27200 | 3.0894 | -22.8879 | -24.4268 | 0.6300 | 1.5389 | -2.4427 | -2.2888 | -2.4917 | -2.3134 |
2.5068 | 0.9303 | 27600 | 3.0896 | -22.8936 | -24.4408 | 0.6300 | 1.5472 | -2.4441 | -2.2894 | -2.4922 | -2.3134 |
0.677 | 0.9437 | 28000 | 3.0835 | -22.8876 | -24.4472 | 0.6300 | 1.5596 | -2.4447 | -2.2888 | -2.4919 | -2.3134 |
2.5931 | 0.9572 | 28400 | 3.0875 | -22.8938 | -24.4419 | 0.6300 | 1.5481 | -2.4442 | -2.2894 | -2.4907 | -2.3117 |
4.4413 | 0.9707 | 28800 | 3.0893 | -22.8952 | -24.4383 | 0.6300 | 1.5431 | -2.4438 | -2.2895 | -2.4914 | -2.3131 |
2.7584 | 0.9842 | 29200 | 3.0874 | -22.8946 | -24.4410 | 0.6300 | 1.5464 | -2.4441 | -2.2895 | -2.4894 | -2.3112 |
4.4406 | 0.9977 | 29600 | 3.0877 | -22.8949 | -24.4444 | 0.6300 | 1.5495 | -2.4444 | -2.2895 | -2.4913 | -2.3131 |
Framework versions
- Transformers 4.45.1
- Pytorch 2.2.2
- Datasets 3.0.1
- Tokenizers 0.20.0
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for focuzz8/smollm-1.7b-instruct-simpo-v2
Base model
HuggingFaceTB/SmolLM-1.7B
Quantized
HuggingFaceTB/SmolLM-1.7B-Instruct