File size: 6,694 Bytes
d70dc0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
434ba40
 
 
 
 
 
 
 
 
d70dc0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
434ba40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d70dc0e
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
model-index:
- name: mistral-dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-dpo

This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6944
- Rewards/chosen: 0.2782
- Rewards/rejected: 0.0543
- Rewards/accuracies: 0.5385
- Rewards/margins: 0.2239
- Logps/rejected: -187.8588
- Logps/chosen: -166.3796
- Logits/rejected: -2.4215
- Logits/chosen: -2.4790

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 250
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7027        | 0.0   | 10   | 0.6989          | 0.0816         | 0.0881           | 0.5577             | -0.0065         | -187.5204      | -168.3459    | -2.4271         | -2.4774       |
| 0.6833        | 0.0   | 20   | 0.7017          | -0.0375        | -0.0327          | 0.5288             | -0.0048         | -188.7280      | -169.5362    | -2.4376         | -2.4828       |
| 0.867         | 0.0   | 30   | 0.7193          | -0.3147        | -0.3086          | 0.5385             | -0.0061         | -191.4871      | -172.3083    | -2.4532         | -2.4942       |
| 0.8962        | 0.0   | 40   | 0.7068          | -0.2076        | -0.2208          | 0.5577             | 0.0132          | -190.6093      | -171.2371    | -2.4597         | -2.5054       |
| 0.7467        | 0.0   | 50   | 0.7008          | 0.1918         | 0.1648           | 0.5577             | 0.0270          | -186.7531      | -167.2434    | -2.4630         | -2.5116       |
| 0.7335        | 0.0   | 60   | 0.6972          | 0.3949         | 0.3373           | 0.5385             | 0.0576          | -185.0280      | -165.2124    | -2.4666         | -2.5130       |
| 0.587         | 0.01  | 70   | 0.7116          | 0.6763         | 0.6193           | 0.4904             | 0.0570          | -182.2083      | -162.3980    | -2.4675         | -2.5126       |
| 0.675         | 0.01  | 80   | 0.7330          | 0.8676         | 0.8385           | 0.5096             | 0.0291          | -180.0161      | -160.4852    | -2.4726         | -2.5171       |
| 0.6117        | 0.01  | 90   | 0.7454          | 0.9576         | 0.9300           | 0.5192             | 0.0276          | -179.1016      | -159.5854    | -2.4757         | -2.5229       |
| 0.5697        | 0.01  | 100  | 0.7715          | 0.9933         | 0.9991           | 0.5                | -0.0059         | -178.4101      | -159.2286    | -2.4736         | -2.5233       |
| 1.1319        | 0.01  | 110  | 0.7652          | 0.9034         | 0.8862           | 0.4904             | 0.0172          | -179.5398      | -160.1275    | -2.4696         | -2.5215       |
| 0.5912        | 0.01  | 120  | 0.7476          | 0.7562         | 0.7007           | 0.5096             | 0.0555          | -181.3943      | -161.5994    | -2.4661         | -2.5186       |
| 0.702         | 0.01  | 130  | 0.7400          | 0.7400         | 0.6590           | 0.5192             | 0.0810          | -181.8113      | -161.7616    | -2.4642         | -2.5211       |
| 0.5566        | 0.01  | 140  | 0.7332          | 0.6338         | 0.5293           | 0.5288             | 0.1044          | -183.1082      | -162.8238    | -2.4650         | -2.5222       |
| 0.7823        | 0.01  | 150  | 0.7327          | 0.5429         | 0.4408           | 0.5385             | 0.1022          | -183.9939      | -163.7323    | -2.4645         | -2.5191       |
| 0.7549        | 0.01  | 160  | 0.7282          | 0.3954         | 0.2907           | 0.5481             | 0.1047          | -185.4949      | -165.2079    | -2.4612         | -2.5138       |
| 0.6506        | 0.01  | 170  | 0.7262          | 0.3748         | 0.2716           | 0.5192             | 0.1031          | -185.6850      | -165.4137    | -2.4579         | -2.5102       |
| 0.559         | 0.01  | 180  | 0.7320          | 0.4578         | 0.3604           | 0.5096             | 0.0974          | -184.7973      | -164.5831    | -2.4589         | -2.5109       |
| 0.9496        | 0.02  | 190  | 0.7150          | 0.4227         | 0.2889           | 0.5192             | 0.1339          | -185.5128      | -164.9340    | -2.4480         | -2.5007       |
| 0.7996        | 0.02  | 200  | 0.7034          | 0.4051         | 0.2378           | 0.5288             | 0.1673          | -186.0234      | -165.1101    | -2.4391         | -2.4926       |
| 0.5733        | 0.02  | 210  | 0.6977          | 0.3946         | 0.2110           | 0.5288             | 0.1836          | -186.2916      | -165.2155    | -2.4327         | -2.4875       |
| 0.5796        | 0.02  | 220  | 0.6981          | 0.3933         | 0.1983           | 0.5288             | 0.1949          | -186.4181      | -165.2286    | -2.4260         | -2.4824       |
| 0.6435        | 0.02  | 230  | 0.6976          | 0.3726         | 0.1714           | 0.5288             | 0.2012          | -186.6871      | -165.4354    | -2.4237         | -2.4807       |
| 0.5993        | 0.02  | 240  | 0.6958          | 0.3088         | 0.0929           | 0.5385             | 0.2159          | -187.4724      | -166.0730    | -2.4222         | -2.4799       |
| 0.9077        | 0.02  | 250  | 0.6944          | 0.2782         | 0.0543           | 0.5385             | 0.2239          | -187.8588      | -166.3796    | -2.4215         | -2.4790       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.37.0
- Pytorch 2.0.1+cu117
- Datasets 2.15.0
- Tokenizers 0.15.1