statking's picture
End of training
590cc04 verified
|
raw
history blame
9.18 kB
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - alignment-handbook
  - trl
  - orpo
  - generated_from_trainer
  - trl
  - orpo
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-sft-full-orpo
    results: []

Visualize in Weights & Biases

zephyr-7b-sft-full-orpo

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3771
  • Rewards/chosen: -0.1391
  • Rewards/rejected: -0.1930
  • Rewards/accuracies: 0.6528
  • Rewards/margins: 0.0539
  • Logps/rejected: -3.8602
  • Logps/chosen: -2.7813
  • Logits/rejected: -2.8670
  • Logits/chosen: -2.8498
  • Nll Loss: 1.3532
  • Log Odds Ratio: -1.0480
  • Log Odds Chosen: 1.2201

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
0.5668 0.1049 100 0.5843 -0.0456 -0.0529 0.6151 0.0073 -1.0580 -0.9113 -3.3148 -3.3082 0.5516 -0.6530 0.2184
0.5676 0.2098 200 0.5726 -0.0441 -0.0532 0.625 0.0092 -1.0644 -0.8811 -3.0026 -2.9992 0.5359 -0.6474 0.2850
0.5819 0.3146 300 0.5552 -0.0439 -0.0531 0.6290 0.0092 -1.0620 -0.8770 -3.1424 -3.1391 0.5202 -0.6464 0.2830
0.5738 0.4195 400 0.5411 -0.0422 -0.0517 0.6290 0.0096 -1.0346 -0.8434 -3.1026 -3.1020 0.5047 -0.6522 0.2961
0.5478 0.5244 500 0.5319 -0.0421 -0.0525 0.6290 0.0105 -1.0509 -0.8415 -3.0260 -3.0286 0.4970 -0.6382 0.3327
0.5146 0.6293 600 0.5240 -0.0408 -0.0508 0.6230 0.0100 -1.0165 -0.8165 -3.1325 -3.1275 0.4883 -0.6418 0.3121
0.5298 0.7341 700 0.5188 -0.0413 -0.0541 0.6429 0.0128 -1.0827 -0.8267 -3.0761 -3.0755 0.4842 -0.6219 0.3869
0.5181 0.8390 800 0.5141 -0.0410 -0.0524 0.6329 0.0114 -1.0475 -0.8198 -3.1382 -3.1394 0.4803 -0.6322 0.3506
0.5239 0.9439 900 0.5086 -0.0402 -0.0506 0.6310 0.0104 -1.0129 -0.8045 -3.1191 -3.1171 0.4748 -0.6328 0.3268
0.2888 1.0488 1000 0.5400 -0.0436 -0.0556 0.6429 0.0120 -1.1128 -0.8724 -3.0171 -3.0190 0.5058 -0.6318 0.3794
0.29 1.1536 1100 0.5385 -0.0437 -0.0574 0.6468 0.0138 -1.1487 -0.8736 -3.0027 -3.0029 0.5042 -0.6256 0.4247
0.2826 1.2585 1200 0.5428 -0.0443 -0.0581 0.6429 0.0139 -1.1626 -0.8854 -2.9620 -2.9583 0.5084 -0.6254 0.4215
0.2796 1.3634 1300 0.5393 -0.0441 -0.0589 0.6468 0.0147 -1.1771 -0.8825 -2.9256 -2.9285 0.5060 -0.6208 0.4508
0.2784 1.4683 1400 0.5365 -0.0444 -0.0589 0.6528 0.0145 -1.1784 -0.8885 -2.9583 -2.9594 0.5037 -0.6236 0.4410
0.2873 1.5732 1500 0.5330 -0.0436 -0.0579 0.6448 0.0143 -1.1584 -0.8718 -2.9664 -2.9657 0.5004 -0.6226 0.4364
0.276 1.6780 1600 0.5367 -0.0442 -0.0594 0.6409 0.0152 -1.1879 -0.8833 -2.9358 -2.9324 0.5041 -0.6160 0.4570
0.2715 1.7829 1700 0.5349 -0.0436 -0.0580 0.6448 0.0145 -1.1603 -0.8710 -3.0209 -3.0194 0.5024 -0.6272 0.4425
0.2717 1.8878 1800 0.5341 -0.0450 -0.0616 0.6548 0.0166 -1.2325 -0.8997 -2.9579 -2.9563 0.5023 -0.6184 0.4824
0.2857 1.9927 1900 0.5408 -0.0454 -0.0620 0.6548 0.0166 -1.2409 -0.9088 -3.0279 -3.0350 0.5091 -0.6193 0.4892
0.1137 2.0975 2000 0.6877 -0.0620 -0.0838 0.6706 0.0218 -1.6761 -1.2408 -2.8815 -2.8704 0.6539 -0.6273 0.5767
0.1192 2.2024 2100 0.7577 -0.0706 -0.0981 0.6726 0.0275 -1.9620 -1.4122 -2.8433 -2.8372 0.7199 -0.6210 0.6958
0.1178 2.3073 2200 1.1762 -0.1205 -0.1717 0.6528 0.0512 -3.4342 -2.4108 -2.9107 -2.8878 1.1197 -0.7778 1.1628
0.1184 2.4122 2300 1.8520 -0.1935 -0.2541 0.6369 0.0606 -5.0812 -3.8696 -2.9226 -2.9102 1.7542 -1.0562 1.3233
0.1172 2.5170 2400 1.0193 -0.1001 -0.1434 0.6409 0.0432 -2.8671 -2.0024 -2.8710 -2.8561 0.9736 -0.8145 1.0075
0.1109 2.6219 2500 1.2050 -0.1209 -0.1677 0.6329 0.0468 -3.3547 -2.4183 -2.8571 -2.8457 1.1724 -0.9768 1.0766
0.1238 2.7268 2600 2.6922 -0.3036 -0.3822 0.5873 0.0786 -7.6444 -6.0725 -2.9967 -2.9805 2.6498 -1.6934 1.6674
0.1192 2.8317 2700 1.2391 -0.1189 -0.1634 0.625 0.0445 -3.2671 -2.3779 -2.8836 -2.8662 1.1910 -0.9507 1.0201
0.1191 2.9365 2800 1.0214 -0.0976 -0.1394 0.6270 0.0418 -2.7882 -1.9523 -2.8221 -2.8059 0.9673 -0.8558 0.9869

Framework versions

  • Transformers 4.41.0.dev0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1