Edit model card

zephyr-7b-dpo-lora-pairrm

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6764
  • Rewards/chosen: -0.9885
  • Rewards/rejected: -1.0650
  • Rewards/accuracies: 0.5657
  • Rewards/margins: 0.0765
  • Logps/rejected: -320.4450
  • Logps/chosen: -307.4615
  • Logits/rejected: -2.7535
  • Logits/chosen: -2.7599

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6916 0.08 100 0.6925 -0.0162 -0.0177 0.5280 0.0015 -215.7187 -210.2296 -2.5058 -2.5086
0.6855 0.16 200 0.6880 -0.0651 -0.0772 0.5613 0.0121 -221.6710 -215.1240 -2.5152 -2.5178
0.6825 0.24 300 0.6854 -0.1874 -0.2081 0.5473 0.0207 -234.7546 -227.3457 -2.5175 -2.5192
0.6676 0.32 400 0.6827 -0.2909 -0.3222 0.5477 0.0313 -246.1682 -237.7042 -2.5347 -2.5368
0.6458 0.4 500 0.6805 -0.3693 -0.4104 0.5567 0.0410 -254.9852 -245.5435 -2.6328 -2.6364
0.6592 0.48 600 0.6789 -0.6010 -0.6528 0.5560 0.0518 -279.2278 -268.7087 -2.6805 -2.6845
0.6107 0.56 700 0.6785 -0.8159 -0.8786 0.5550 0.0627 -301.8047 -290.1964 -2.6914 -2.6969
0.6475 0.64 800 0.6770 -0.8845 -0.9544 0.5610 0.0699 -309.3867 -297.0627 -2.7237 -2.7295
0.6639 0.72 900 0.6766 -0.9705 -1.0450 0.5667 0.0746 -318.4507 -305.6558 -2.7464 -2.7525
0.6305 0.8 1000 0.6764 -0.9844 -1.0603 0.5680 0.0759 -319.9799 -307.0536 -2.7543 -2.7606
0.6754 0.88 1100 0.6763 -0.9882 -1.0648 0.5687 0.0766 -320.4283 -307.4264 -2.7538 -2.7602
0.6577 0.96 1200 0.6764 -0.9885 -1.0649 0.5663 0.0764 -320.4412 -307.4615 -2.7538 -2.7602

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
19
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for shenxq/zephyr-7b-dpo-lora-pairrm

Adapter
(883)
this model

Dataset used to train shenxq/zephyr-7b-dpo-lora-pairrm