File size: 9,180 Bytes
473666a
 
 
 
590cc04
 
 
 
7efed13
 
f035f13
7efed13
590cc04
 
473666a
 
 
 
 
 
 
 
77068ec
473666a
 
590cc04
f035f13
590cc04
 
 
 
 
 
 
 
 
 
 
 
473666a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77068ec
473666a
 
 
f035f13
 
77068ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
473666a
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
- alignment-handbook
- trl
- orpo
- generated_from_trainer
- trl
- orpo
- alignment-handbook
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-sft-full-orpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/b45ab3qe)
# zephyr-7b-sft-full-orpo

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3771
- Rewards/chosen: -0.1391
- Rewards/rejected: -0.1930
- Rewards/accuracies: 0.6528
- Rewards/margins: 0.0539
- Logps/rejected: -3.8602
- Logps/chosen: -2.7813
- Logits/rejected: -2.8670
- Logits/chosen: -2.8498
- Nll Loss: 1.3532
- Log Odds Ratio: -1.0480
- Log Odds Chosen: 1.2201

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_steps: 100
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
| 0.5668        | 0.1049 | 100  | 0.5843          | -0.0456        | -0.0529          | 0.6151             | 0.0073          | -1.0580        | -0.9113      | -3.3148         | -3.3082       | 0.5516   | -0.6530        | 0.2184          |
| 0.5676        | 0.2098 | 200  | 0.5726          | -0.0441        | -0.0532          | 0.625              | 0.0092          | -1.0644        | -0.8811      | -3.0026         | -2.9992       | 0.5359   | -0.6474        | 0.2850          |
| 0.5819        | 0.3146 | 300  | 0.5552          | -0.0439        | -0.0531          | 0.6290             | 0.0092          | -1.0620        | -0.8770      | -3.1424         | -3.1391       | 0.5202   | -0.6464        | 0.2830          |
| 0.5738        | 0.4195 | 400  | 0.5411          | -0.0422        | -0.0517          | 0.6290             | 0.0096          | -1.0346        | -0.8434      | -3.1026         | -3.1020       | 0.5047   | -0.6522        | 0.2961          |
| 0.5478        | 0.5244 | 500  | 0.5319          | -0.0421        | -0.0525          | 0.6290             | 0.0105          | -1.0509        | -0.8415      | -3.0260         | -3.0286       | 0.4970   | -0.6382        | 0.3327          |
| 0.5146        | 0.6293 | 600  | 0.5240          | -0.0408        | -0.0508          | 0.6230             | 0.0100          | -1.0165        | -0.8165      | -3.1325         | -3.1275       | 0.4883   | -0.6418        | 0.3121          |
| 0.5298        | 0.7341 | 700  | 0.5188          | -0.0413        | -0.0541          | 0.6429             | 0.0128          | -1.0827        | -0.8267      | -3.0761         | -3.0755       | 0.4842   | -0.6219        | 0.3869          |
| 0.5181        | 0.8390 | 800  | 0.5141          | -0.0410        | -0.0524          | 0.6329             | 0.0114          | -1.0475        | -0.8198      | -3.1382         | -3.1394       | 0.4803   | -0.6322        | 0.3506          |
| 0.5239        | 0.9439 | 900  | 0.5086          | -0.0402        | -0.0506          | 0.6310             | 0.0104          | -1.0129        | -0.8045      | -3.1191         | -3.1171       | 0.4748   | -0.6328        | 0.3268          |
| 0.2888        | 1.0488 | 1000 | 0.5400          | -0.0436        | -0.0556          | 0.6429             | 0.0120          | -1.1128        | -0.8724      | -3.0171         | -3.0190       | 0.5058   | -0.6318        | 0.3794          |
| 0.29          | 1.1536 | 1100 | 0.5385          | -0.0437        | -0.0574          | 0.6468             | 0.0138          | -1.1487        | -0.8736      | -3.0027         | -3.0029       | 0.5042   | -0.6256        | 0.4247          |
| 0.2826        | 1.2585 | 1200 | 0.5428          | -0.0443        | -0.0581          | 0.6429             | 0.0139          | -1.1626        | -0.8854      | -2.9620         | -2.9583       | 0.5084   | -0.6254        | 0.4215          |
| 0.2796        | 1.3634 | 1300 | 0.5393          | -0.0441        | -0.0589          | 0.6468             | 0.0147          | -1.1771        | -0.8825      | -2.9256         | -2.9285       | 0.5060   | -0.6208        | 0.4508          |
| 0.2784        | 1.4683 | 1400 | 0.5365          | -0.0444        | -0.0589          | 0.6528             | 0.0145          | -1.1784        | -0.8885      | -2.9583         | -2.9594       | 0.5037   | -0.6236        | 0.4410          |
| 0.2873        | 1.5732 | 1500 | 0.5330          | -0.0436        | -0.0579          | 0.6448             | 0.0143          | -1.1584        | -0.8718      | -2.9664         | -2.9657       | 0.5004   | -0.6226        | 0.4364          |
| 0.276         | 1.6780 | 1600 | 0.5367          | -0.0442        | -0.0594          | 0.6409             | 0.0152          | -1.1879        | -0.8833      | -2.9358         | -2.9324       | 0.5041   | -0.6160        | 0.4570          |
| 0.2715        | 1.7829 | 1700 | 0.5349          | -0.0436        | -0.0580          | 0.6448             | 0.0145          | -1.1603        | -0.8710      | -3.0209         | -3.0194       | 0.5024   | -0.6272        | 0.4425          |
| 0.2717        | 1.8878 | 1800 | 0.5341          | -0.0450        | -0.0616          | 0.6548             | 0.0166          | -1.2325        | -0.8997      | -2.9579         | -2.9563       | 0.5023   | -0.6184        | 0.4824          |
| 0.2857        | 1.9927 | 1900 | 0.5408          | -0.0454        | -0.0620          | 0.6548             | 0.0166          | -1.2409        | -0.9088      | -3.0279         | -3.0350       | 0.5091   | -0.6193        | 0.4892          |
| 0.1137        | 2.0975 | 2000 | 0.6877          | -0.0620        | -0.0838          | 0.6706             | 0.0218          | -1.6761        | -1.2408      | -2.8815         | -2.8704       | 0.6539   | -0.6273        | 0.5767          |
| 0.1192        | 2.2024 | 2100 | 0.7577          | -0.0706        | -0.0981          | 0.6726             | 0.0275          | -1.9620        | -1.4122      | -2.8433         | -2.8372       | 0.7199   | -0.6210        | 0.6958          |
| 0.1178        | 2.3073 | 2200 | 1.1762          | -0.1205        | -0.1717          | 0.6528             | 0.0512          | -3.4342        | -2.4108      | -2.9107         | -2.8878       | 1.1197   | -0.7778        | 1.1628          |
| 0.1184        | 2.4122 | 2300 | 1.8520          | -0.1935        | -0.2541          | 0.6369             | 0.0606          | -5.0812        | -3.8696      | -2.9226         | -2.9102       | 1.7542   | -1.0562        | 1.3233          |
| 0.1172        | 2.5170 | 2400 | 1.0193          | -0.1001        | -0.1434          | 0.6409             | 0.0432          | -2.8671        | -2.0024      | -2.8710         | -2.8561       | 0.9736   | -0.8145        | 1.0075          |
| 0.1109        | 2.6219 | 2500 | 1.2050          | -0.1209        | -0.1677          | 0.6329             | 0.0468          | -3.3547        | -2.4183      | -2.8571         | -2.8457       | 1.1724   | -0.9768        | 1.0766          |
| 0.1238        | 2.7268 | 2600 | 2.6922          | -0.3036        | -0.3822          | 0.5873             | 0.0786          | -7.6444        | -6.0725      | -2.9967         | -2.9805       | 2.6498   | -1.6934        | 1.6674          |
| 0.1192        | 2.8317 | 2700 | 1.2391          | -0.1189        | -0.1634          | 0.625              | 0.0445          | -3.2671        | -2.3779      | -2.8836         | -2.8662       | 1.1910   | -0.9507        | 1.0201          |
| 0.1191        | 2.9365 | 2800 | 1.0214          | -0.0976        | -0.1394          | 0.6270             | 0.0418          | -2.7882        | -1.9523      | -2.8221         | -2.8059       | 0.9673   | -0.8558        | 0.9869          |


### Framework versions

- Transformers 4.41.0.dev0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1