tsavage68's picture
End of training
7383814 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e8rate_05beta_cSFTDPO
    results: []

IE_L3_1000steps_1e8rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6901
  • Rewards/chosen: -0.0305
  • Rewards/rejected: -0.0517
  • Rewards/accuracies: 0.4200
  • Rewards/margins: 0.0213
  • Logps/rejected: -75.7307
  • Logps/chosen: -82.8587
  • Logits/rejected: -0.7970
  • Logits/chosen: -0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6994 0.4 50 0.7013 -0.0193 -0.0168 0.375 -0.0025 -75.6609 -82.8363 -0.7968 -0.7397
0.7002 0.8 100 0.7038 -0.0158 -0.0084 0.3450 -0.0074 -75.6441 -82.8293 -0.7971 -0.7401
0.6907 1.2 150 0.7016 -0.0214 -0.0182 0.3800 -0.0033 -75.6636 -82.8406 -0.7968 -0.7396
0.7125 1.6 200 0.6880 -0.0323 -0.0559 0.4100 0.0236 -75.7390 -82.8623 -0.7969 -0.7398
0.6784 2.0 250 0.7048 -0.0506 -0.0419 0.3800 -0.0087 -75.7110 -82.8989 -0.7967 -0.7399
0.7093 2.4 300 0.6873 -0.0310 -0.0578 0.4400 0.0268 -75.7429 -82.8598 -0.7973 -0.7402
0.6769 2.8 350 0.6770 -0.0179 -0.0654 0.4200 0.0475 -75.7580 -82.8335 -0.7972 -0.7402
0.6876 3.2 400 0.6995 -0.0297 -0.0340 0.3500 0.0044 -75.6953 -82.8571 -0.7966 -0.7395
0.6809 3.6 450 0.6703 -0.0395 -0.1022 0.4600 0.0627 -75.8316 -82.8767 -0.7972 -0.7402
0.6812 4.0 500 0.6853 -0.0127 -0.0416 0.3900 0.0289 -75.7105 -82.8232 -0.7972 -0.7404
0.7342 4.4 550 0.6907 -0.0234 -0.0410 0.4150 0.0176 -75.7092 -82.8446 -0.7966 -0.7396
0.6772 4.8 600 0.6824 -0.0324 -0.0676 0.4450 0.0352 -75.7624 -82.8625 -0.7968 -0.7399
0.6918 5.2 650 0.6813 -0.0468 -0.0861 0.3950 0.0393 -75.7994 -82.8913 -0.7973 -0.7402
0.6778 5.6 700 0.6899 -0.0390 -0.0590 0.4250 0.0200 -75.7452 -82.8757 -0.7970 -0.7398
0.6814 6.0 750 0.6861 -0.0310 -0.0623 0.4000 0.0313 -75.7518 -82.8598 -0.7969 -0.7399
0.7158 6.4 800 0.6828 -0.0206 -0.0575 0.4250 0.0370 -75.7423 -82.8389 -0.7970 -0.7400
0.6827 6.8 850 0.6909 -0.0294 -0.0489 0.4200 0.0195 -75.7250 -82.8565 -0.7970 -0.7401
0.7306 7.2 900 0.6901 -0.0305 -0.0517 0.4200 0.0213 -75.7307 -82.8587 -0.7970 -0.7401
0.6964 7.6 950 0.6901 -0.0305 -0.0517 0.4200 0.0213 -75.7307 -82.8587 -0.7970 -0.7401
0.687 8.0 1000 0.6901 -0.0305 -0.0517 0.4200 0.0213 -75.7307 -82.8587 -0.7970 -0.7401

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1