tsavage68's picture
End of training
1a8c70e verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e5rate_05beta_cSFTDPO
    results: []

IE_L3_1000steps_1e5rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -1.9138
  • Rewards/rejected: -16.8689
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 14.9551
  • Logps/rejected: -109.3650
  • Logps/chosen: -86.6253
  • Logits/rejected: -0.7926
  • Logits/chosen: -0.7113

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1906 0.4 50 0.1802 -1.6520 -15.8076 0.7400 14.1556 -107.2424 -86.1018 -0.7917 -0.7117
0.1386 0.8 100 0.1802 -1.8267 -16.5557 0.7400 14.7290 -108.7386 -86.4511 -0.7906 -0.7103
0.1386 1.2 150 0.1802 -1.8547 -16.5627 0.7400 14.7080 -108.7527 -86.5072 -0.7921 -0.7119
0.1733 1.6 200 0.1802 -1.8689 -16.5821 0.7400 14.7132 -108.7914 -86.5355 -0.7914 -0.7112
0.2253 2.0 250 0.1802 -1.8605 -16.6156 0.7400 14.7552 -108.8585 -86.5187 -0.7914 -0.7110
0.1386 2.4 300 0.1802 -1.8594 -16.6192 0.7400 14.7598 -108.8657 -86.5166 -0.7911 -0.7110
0.1213 2.8 350 0.1802 -1.8731 -16.6287 0.7400 14.7556 -108.8846 -86.5440 -0.7901 -0.7097
0.1906 3.2 400 0.1802 -1.8656 -16.7018 0.7400 14.8363 -109.0309 -86.5289 -0.7915 -0.7108
0.1906 3.6 450 0.1802 -1.8643 -16.6935 0.7400 14.8292 -109.0142 -86.5264 -0.7910 -0.7101
0.2079 4.0 500 0.1802 -1.8487 -16.6943 0.7400 14.8456 -109.0159 -86.4952 -0.7915 -0.7105
0.156 4.4 550 0.1802 -1.8609 -16.7207 0.7400 14.8598 -109.0686 -86.5195 -0.7923 -0.7110
0.1213 4.8 600 0.1802 -1.8764 -16.7597 0.7400 14.8833 -109.1467 -86.5507 -0.7921 -0.7111
0.1906 5.2 650 0.1802 -1.8747 -16.8014 0.7400 14.9267 -109.2300 -86.5471 -0.7919 -0.7103
0.2426 5.6 700 0.1802 -1.8684 -16.7797 0.7400 14.9113 -109.1867 -86.5346 -0.7925 -0.7117
0.2599 6.0 750 0.1802 -1.8981 -16.8462 0.7400 14.9481 -109.3197 -86.5939 -0.7929 -0.7119
0.1213 6.4 800 0.1802 -1.8918 -16.8690 0.7400 14.9772 -109.3652 -86.5813 -0.7929 -0.7119
0.2426 6.8 850 0.1802 -1.8689 -16.8074 0.7400 14.9386 -109.2421 -86.5355 -0.7932 -0.7122
0.1733 7.2 900 0.1802 -1.8717 -16.8482 0.7400 14.9765 -109.3236 -86.5412 -0.7924 -0.7110
0.1386 7.6 950 0.1802 -1.9143 -16.8686 0.7400 14.9543 -109.3644 -86.6264 -0.7926 -0.7113
0.156 8.0 1000 0.1802 -1.9138 -16.8689 0.7400 14.9551 -109.3650 -86.6253 -0.7926 -0.7113

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1