zephyr-7b-dpo-qlora / README.md
shenxq's picture
End of training
d81199e verified
|
raw
history blame
No virus
4.37 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6707
  • Rewards/chosen: -0.2860
  • Rewards/rejected: -0.3548
  • Rewards/accuracies: 0.5983
  • Rewards/margins: 0.0687
  • Logps/rejected: -367.6676
  • Logps/chosen: -351.0971
  • Logits/rejected: -2.5801
  • Logits/chosen: -2.5726

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.08 100 0.6930 -0.0030 -0.0033 0.5220 0.0003 -332.5208 -322.7949 -2.4978 -2.4908
0.6921 0.16 200 0.6927 -0.0232 -0.0243 0.5183 0.0011 -334.6197 -324.8167 -2.4970 -2.4900
0.6913 0.24 300 0.6919 -0.0414 -0.0441 0.5340 0.0027 -336.6059 -326.6393 -2.4967 -2.4895
0.6893 0.32 400 0.6891 -0.0791 -0.0883 0.5547 0.0093 -341.0244 -330.4017 -2.5023 -2.4953
0.6724 0.4 500 0.6844 -0.2018 -0.2253 0.5530 0.0235 -354.7256 -342.6785 -2.5100 -2.5029
0.6849 0.48 600 0.6805 -0.3366 -0.3770 0.5597 0.0404 -369.8958 -356.1591 -2.5412 -2.5347
0.6503 0.56 700 0.6774 -0.4376 -0.4919 0.5630 0.0543 -381.3843 -366.2523 -2.5492 -2.5431
0.6841 0.64 800 0.6735 -0.3183 -0.3788 0.5913 0.0605 -370.0676 -354.3206 -2.5662 -2.5592
0.6773 0.72 900 0.6724 -0.3986 -0.4678 0.5887 0.0692 -378.9693 -362.3546 -2.5774 -2.5706
0.657 0.8 1000 0.6711 -0.2774 -0.3440 0.5997 0.0666 -366.5909 -350.2372 -2.5784 -2.5708
0.6577 0.88 1100 0.6706 -0.2934 -0.3628 0.5993 0.0693 -368.4680 -351.8376 -2.5805 -2.5729
0.6444 0.96 1200 0.6708 -0.2860 -0.3547 0.5993 0.0687 -367.6592 -351.0949 -2.5801 -2.5725

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0