tinyllama-1.1b-chat-dpo-qlora

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-chat-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6084
  • Rewards/chosen: -1.0875
  • Rewards/rejected: -1.3916
  • Rewards/accuracies: 0.6580
  • Rewards/margins: 0.3041
  • Logps/rejected: -490.8393
  • Logps/chosen: -504.9714
  • Logits/rejected: -2.6096
  • Logits/chosen: -2.6425

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6921 0.03 100 0.6923 0.0160 0.0142 0.5645 0.0018 -350.2683 -394.6286 -2.7841 -2.8363
0.6894 0.05 200 0.6894 0.0433 0.0353 0.5920 0.0080 -348.1495 -391.8949 -2.7811 -2.8333
0.6815 0.08 300 0.6844 0.0806 0.0609 0.6025 0.0196 -345.5898 -388.1692 -2.7838 -2.8349
0.6869 0.1 400 0.6788 0.0607 0.0269 0.6125 0.0339 -348.9979 -390.1522 -2.7931 -2.8423
0.6744 0.13 500 0.6724 0.0243 -0.0249 0.6210 0.0492 -354.1764 -393.7983 -2.7889 -2.8371
0.6679 0.16 600 0.6625 -0.0566 -0.1346 0.6265 0.0780 -365.1402 -401.8826 -2.7709 -2.8179
0.637 0.18 700 0.6555 -0.2568 -0.3654 0.6290 0.1086 -388.2211 -421.9038 -2.7596 -2.8051
0.6166 0.21 800 0.6488 -0.3935 -0.5223 0.6320 0.1288 -403.9116 -435.5756 -2.7523 -2.7961
0.6335 0.24 900 0.6458 -0.4516 -0.6042 0.6380 0.1527 -412.1083 -441.3798 -2.7325 -2.7764
0.6286 0.26 1000 0.6406 -0.8692 -1.0442 0.625 0.1750 -456.1026 -483.1429 -2.7123 -2.7531
0.669 0.29 1100 0.6406 -0.3445 -0.4984 0.6365 0.1538 -401.5222 -430.6789 -2.6946 -2.7354
0.6723 0.31 1200 0.6358 -0.4619 -0.6430 0.6425 0.1811 -415.9841 -442.4163 -2.6701 -2.7077
0.605 0.34 1300 0.6297 -0.6894 -0.8903 0.6435 0.2009 -440.7144 -465.1627 -2.6764 -2.7122
0.6361 0.37 1400 0.6267 -0.7144 -0.9307 0.6505 0.2163 -444.7496 -467.6648 -2.6711 -2.7091
0.6085 0.39 1500 0.6213 -1.0532 -1.3084 0.6490 0.2552 -482.5256 -501.5469 -2.6435 -2.6797
0.6317 0.42 1600 0.6197 -1.1246 -1.3825 0.6490 0.2579 -489.9323 -508.6858 -2.6172 -2.6506
0.6702 0.44 1700 0.6182 -1.0036 -1.2644 0.6530 0.2609 -478.1268 -496.5815 -2.6407 -2.6762
0.5658 0.47 1800 0.6219 -1.3479 -1.6348 0.6445 0.2869 -515.1606 -531.0145 -2.5866 -2.6182
0.6039 0.5 1900 0.6154 -0.9014 -1.1716 0.6630 0.2702 -468.8458 -486.3656 -2.6376 -2.6742
0.6173 0.52 2000 0.6121 -1.1535 -1.4470 0.6575 0.2934 -496.3810 -511.5793 -2.6232 -2.6580
0.62 0.55 2100 0.6116 -1.1600 -1.4523 0.6650 0.2923 -496.9117 -512.2247 -2.6278 -2.6629
0.5957 0.58 2200 0.6132 -0.9592 -1.2431 0.6655 0.2839 -475.9958 -492.1489 -2.6317 -2.6674
0.6093 0.6 2300 0.6138 -1.0935 -1.3811 0.6625 0.2876 -489.7906 -505.5738 -2.6283 -2.6619
0.6009 0.63 2400 0.6108 -1.0519 -1.3479 0.6610 0.2959 -486.4695 -501.4175 -2.6088 -2.6432
0.5988 0.65 2500 0.6108 -1.0427 -1.3419 0.6590 0.2992 -485.8730 -500.4982 -2.6143 -2.6477
0.606 0.68 2600 0.6112 -1.0188 -1.3192 0.6545 0.3003 -483.6013 -498.1078 -2.5974 -2.6304
0.6118 0.71 2700 0.6106 -1.0808 -1.3857 0.6595 0.3049 -490.2562 -504.3045 -2.5945 -2.6274
0.6134 0.73 2800 0.6096 -1.1549 -1.4635 0.6585 0.3086 -498.0366 -511.7179 -2.5978 -2.6303
0.6159 0.76 2900 0.6097 -1.0550 -1.3509 0.6585 0.2959 -486.7739 -501.7256 -2.6175 -2.6500
0.5815 0.79 3000 0.6091 -1.1025 -1.4048 0.6570 0.3023 -492.1650 -506.4727 -2.6089 -2.6420
0.5885 0.81 3100 0.6089 -1.0977 -1.4006 0.6595 0.3029 -491.7444 -505.9960 -2.6001 -2.6337
0.6074 0.84 3200 0.6086 -1.0982 -1.4029 0.6605 0.3047 -491.9724 -506.0455 -2.6056 -2.6388
0.5981 0.86 3300 0.6087 -1.0853 -1.3881 0.6610 0.3028 -490.4915 -504.7571 -2.6117 -2.6442
0.5944 0.89 3400 0.6087 -1.0897 -1.3931 0.6580 0.3034 -490.9887 -505.1947 -2.6026 -2.6360
0.5979 0.92 3500 0.6085 -1.0922 -1.3962 0.6595 0.3040 -491.3070 -505.4438 -2.6136 -2.6460
0.6154 0.94 3600 0.6086 -1.0905 -1.3946 0.6595 0.3040 -491.1413 -505.2781 -2.6066 -2.6397
0.6053 0.97 3700 0.6086 -1.0907 -1.3946 0.6550 0.3039 -491.1405 -505.2943 -2.6094 -2.6423
0.602 0.99 3800 0.6085 -1.0876 -1.3914 0.6580 0.3038 -490.8211 -504.9807 -2.6096 -2.6425

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for martimfasantos/tinyllama-1.1b-chat-dpo-qlora

Dataset used to train martimfasantos/tinyllama-1.1b-chat-dpo-qlora