tsavage68's picture
End of training
c28a1e0 verified
metadata
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: MPT_1000_STEPS_1e7_rate_03_beta_DPO
    results: []

MPT_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of mosaicml/mpt-7b-instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6919
  • Rewards/chosen: -0.0230
  • Rewards/rejected: -0.0291
  • Rewards/accuracies: 0.5275
  • Rewards/margins: 0.0061
  • Logps/rejected: -21.6156
  • Logps/chosen: -20.8382
  • Logits/rejected: 14.2213
  • Logits/chosen: 14.2239

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6958 0.05 50 0.6969 -0.0103 -0.0064 0.4791 -0.0040 -21.5702 -20.8128 14.2683 14.2709
0.6948 0.1 100 0.6966 -0.0023 0.0014 0.5077 -0.0037 -21.5546 -20.7968 14.2571 14.2597
0.6971 0.15 150 0.7007 -0.0051 0.0067 0.4681 -0.0117 -21.5441 -20.8024 14.2475 14.2501
0.6891 0.2 200 0.6943 0.0187 0.0174 0.4923 0.0013 -21.5227 -20.7548 14.2452 14.2478
0.6906 0.24 250 0.6922 0.0036 -0.0018 0.4747 0.0054 -21.5609 -20.7850 14.2395 14.2421
0.6865 0.29 300 0.6942 0.0038 0.0023 0.4857 0.0015 -21.5528 -20.7845 14.2393 14.2419
0.7058 0.34 350 0.6939 -0.0025 -0.0045 0.5055 0.0020 -21.5664 -20.7971 14.2533 14.2559
0.6817 0.39 400 0.6918 -0.0255 -0.0318 0.5143 0.0063 -21.6210 -20.8431 14.2343 14.2369
0.6726 0.44 450 0.6902 -0.0203 -0.0301 0.5582 0.0099 -21.6177 -20.8327 14.2287 14.2313
0.6927 0.49 500 0.6903 -0.0159 -0.0254 0.5209 0.0096 -21.6083 -20.8239 14.2329 14.2355
0.6728 0.54 550 0.6905 -0.0252 -0.0342 0.5297 0.0089 -21.6258 -20.8426 14.2305 14.2331
0.6733 0.59 600 0.6877 -0.0158 -0.0305 0.5341 0.0147 -21.6184 -20.8237 14.2330 14.2356
0.6937 0.64 650 0.6916 -0.0222 -0.0293 0.5341 0.0071 -21.6161 -20.8365 14.2242 14.2268
0.6771 0.68 700 0.6921 -0.0234 -0.0294 0.5231 0.0060 -21.6163 -20.8391 14.2289 14.2315
0.6874 0.73 750 0.6916 -0.0219 -0.0286 0.5121 0.0067 -21.6147 -20.8361 14.2292 14.2317
0.6772 0.78 800 0.6888 -0.0187 -0.0313 0.5473 0.0127 -21.6201 -20.8295 14.2308 14.2334
0.7033 0.83 850 0.6886 -0.0163 -0.0294 0.5297 0.0131 -21.6163 -20.8248 14.2220 14.2245
0.6772 0.88 900 0.6894 -0.0217 -0.0330 0.5297 0.0113 -21.6235 -20.8357 14.2227 14.2253
0.696 0.93 950 0.6918 -0.0229 -0.0293 0.5275 0.0064 -21.6160 -20.8380 14.2213 14.2239
0.6881 0.98 1000 0.6919 -0.0230 -0.0291 0.5275 0.0061 -21.6156 -20.8382 14.2213 14.2239

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2