yakazimir's picture
End of training
b37f9ad verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_cpo_entropy_0_1
    results: []

qwen_cpo_entropy_0_1

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7405
  • Sft Loss: 1.6848
  • Rewards/chosen: -1.7146
  • Rewards/rejected: -2.3727
  • Rewards/accuracies: 0.6773
  • Rewards/margins: 0.6581
  • Logps/rejected: -2.3727
  • Logps/chosen: -1.7146
  • Logits/rejected: 0.3000
  • Logits/chosen: 0.1875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.8248 0.2141 400 0.8255 1.3905 -1.3850 -1.5360 0.5645 0.1510 -1.5360 -1.3850 0.3069 0.2210
0.7884 0.4282 800 0.7811 1.4857 -1.5199 -1.8625 0.6113 0.3426 -1.8625 -1.5199 0.4914 0.3895
0.8073 0.6422 1200 0.7653 1.5452 -1.5531 -1.9756 0.6298 0.4226 -1.9756 -1.5531 0.5229 0.4111
0.7417 0.8563 1600 0.7599 1.5652 -1.5632 -1.9862 0.6484 0.4230 -1.9862 -1.5632 0.5072 0.3924
0.8212 1.0704 2000 0.7518 1.5561 -1.5506 -2.0302 0.6543 0.4796 -2.0302 -1.5506 0.4351 0.3208
0.7326 1.2845 2400 0.7455 1.6027 -1.6077 -2.1582 0.6632 0.5505 -2.1582 -1.6077 0.4993 0.3799
0.7742 1.4986 2800 0.7444 1.6196 -1.6148 -2.1590 0.6632 0.5442 -2.1590 -1.6148 0.4611 0.3432
0.7597 1.7127 3200 0.7438 1.6039 -1.6049 -2.1441 0.6632 0.5392 -2.1441 -1.6049 0.3926 0.2796
0.7128 1.9267 3600 0.7399 1.6368 -1.6446 -2.2337 0.6780 0.5891 -2.2337 -1.6446 0.3607 0.2486
0.6636 2.1408 4000 0.7399 1.6738 -1.6828 -2.3162 0.6780 0.6334 -2.3162 -1.6828 0.3064 0.1955
0.6929 2.3549 4400 0.7421 1.7043 -1.7385 -2.4029 0.6795 0.6644 -2.4029 -1.7385 0.3030 0.1902
0.6939 2.5690 4800 0.7411 1.6769 -1.7078 -2.3536 0.6758 0.6458 -2.3536 -1.7078 0.1986 0.0944
0.6831 2.7831 5200 0.7409 1.6830 -1.7130 -2.3694 0.6766 0.6564 -2.3694 -1.7130 0.3256 0.2110
0.6951 2.9972 5600 0.7405 1.6848 -1.7146 -2.3727 0.6773 0.6581 -2.3727 -1.7146 0.3000 0.1875

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1