File size: 4,365 Bytes
ac637be
 
 
 
d81199e
 
ac637be
 
 
d81199e
 
ac637be
 
 
 
 
 
 
 
 
 
 
d81199e
ac637be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6707
- Rewards/chosen: -0.2860
- Rewards/rejected: -0.3548
- Rewards/accuracies: 0.5983
- Rewards/margins: 0.0687
- Logps/rejected: -367.6676
- Logps/chosen: -351.0971
- Logits/rejected: -2.5801
- Logits/chosen: -2.5726

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6932        | 0.08  | 100  | 0.6930          | -0.0030        | -0.0033          | 0.5220             | 0.0003          | -332.5208      | -322.7949    | -2.4978         | -2.4908       |
| 0.6921        | 0.16  | 200  | 0.6927          | -0.0232        | -0.0243          | 0.5183             | 0.0011          | -334.6197      | -324.8167    | -2.4970         | -2.4900       |
| 0.6913        | 0.24  | 300  | 0.6919          | -0.0414        | -0.0441          | 0.5340             | 0.0027          | -336.6059      | -326.6393    | -2.4967         | -2.4895       |
| 0.6893        | 0.32  | 400  | 0.6891          | -0.0791        | -0.0883          | 0.5547             | 0.0093          | -341.0244      | -330.4017    | -2.5023         | -2.4953       |
| 0.6724        | 0.4   | 500  | 0.6844          | -0.2018        | -0.2253          | 0.5530             | 0.0235          | -354.7256      | -342.6785    | -2.5100         | -2.5029       |
| 0.6849        | 0.48  | 600  | 0.6805          | -0.3366        | -0.3770          | 0.5597             | 0.0404          | -369.8958      | -356.1591    | -2.5412         | -2.5347       |
| 0.6503        | 0.56  | 700  | 0.6774          | -0.4376        | -0.4919          | 0.5630             | 0.0543          | -381.3843      | -366.2523    | -2.5492         | -2.5431       |
| 0.6841        | 0.64  | 800  | 0.6735          | -0.3183        | -0.3788          | 0.5913             | 0.0605          | -370.0676      | -354.3206    | -2.5662         | -2.5592       |
| 0.6773        | 0.72  | 900  | 0.6724          | -0.3986        | -0.4678          | 0.5887             | 0.0692          | -378.9693      | -362.3546    | -2.5774         | -2.5706       |
| 0.657         | 0.8   | 1000 | 0.6711          | -0.2774        | -0.3440          | 0.5997             | 0.0666          | -366.5909      | -350.2372    | -2.5784         | -2.5708       |
| 0.6577        | 0.88  | 1100 | 0.6706          | -0.2934        | -0.3628          | 0.5993             | 0.0693          | -368.4680      | -351.8376    | -2.5805         | -2.5729       |
| 0.6444        | 0.96  | 1200 | 0.6708          | -0.2860        | -0.3547          | 0.5993             | 0.0687          | -367.6592      | -351.0949    | -2.5801         | -2.5725       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.0