zephyr-7b-dpo-qlora / README.md
shenxq's picture
End of training
d81199e verified
|
raw
history blame
4.37 kB
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# zephyr-7b-dpo-qlora
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6707
- Rewards/chosen: -0.2860
- Rewards/rejected: -0.3548
- Rewards/accuracies: 0.5983
- Rewards/margins: 0.0687
- Logps/rejected: -367.6676
- Logps/chosen: -351.0971
- Logits/rejected: -2.5801
- Logits/chosen: -2.5726
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6932 | 0.08 | 100 | 0.6930 | -0.0030 | -0.0033 | 0.5220 | 0.0003 | -332.5208 | -322.7949 | -2.4978 | -2.4908 |
| 0.6921 | 0.16 | 200 | 0.6927 | -0.0232 | -0.0243 | 0.5183 | 0.0011 | -334.6197 | -324.8167 | -2.4970 | -2.4900 |
| 0.6913 | 0.24 | 300 | 0.6919 | -0.0414 | -0.0441 | 0.5340 | 0.0027 | -336.6059 | -326.6393 | -2.4967 | -2.4895 |
| 0.6893 | 0.32 | 400 | 0.6891 | -0.0791 | -0.0883 | 0.5547 | 0.0093 | -341.0244 | -330.4017 | -2.5023 | -2.4953 |
| 0.6724 | 0.4 | 500 | 0.6844 | -0.2018 | -0.2253 | 0.5530 | 0.0235 | -354.7256 | -342.6785 | -2.5100 | -2.5029 |
| 0.6849 | 0.48 | 600 | 0.6805 | -0.3366 | -0.3770 | 0.5597 | 0.0404 | -369.8958 | -356.1591 | -2.5412 | -2.5347 |
| 0.6503 | 0.56 | 700 | 0.6774 | -0.4376 | -0.4919 | 0.5630 | 0.0543 | -381.3843 | -366.2523 | -2.5492 | -2.5431 |
| 0.6841 | 0.64 | 800 | 0.6735 | -0.3183 | -0.3788 | 0.5913 | 0.0605 | -370.0676 | -354.3206 | -2.5662 | -2.5592 |
| 0.6773 | 0.72 | 900 | 0.6724 | -0.3986 | -0.4678 | 0.5887 | 0.0692 | -378.9693 | -362.3546 | -2.5774 | -2.5706 |
| 0.657 | 0.8 | 1000 | 0.6711 | -0.2774 | -0.3440 | 0.5997 | 0.0666 | -366.5909 | -350.2372 | -2.5784 | -2.5708 |
| 0.6577 | 0.88 | 1100 | 0.6706 | -0.2934 | -0.3628 | 0.5993 | 0.0693 | -368.4680 | -351.8376 | -2.5805 | -2.5729 |
| 0.6444 | 0.96 | 1200 | 0.6708 | -0.2860 | -0.3547 | 0.5993 | 0.0687 | -367.6592 | -351.0949 | -2.5801 | -2.5725 |
### Framework versions
- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.0