File size: 4,448 Bytes
19b9ecf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23407f7
 
 
 
 
 
 
 
 
 
19b9ecf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23407f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19b9ecf
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
base_model: lvwerra/gpt2-imdb
tags:
- generated_from_trainer
model-index:
- name: gpt-imdb-dpo_annealing
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt-imdb-dpo_annealing

This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3482
- Rewards/chosen: -13.2925
- Rewards/rejected: -37.2767
- Rewards/accuracies: 0.9354
- Rewards/margins: 23.9842
- Logps/rejected: -302.0002
- Logps/chosen: -248.9281
- Logits/rejected: -38.9773
- Logits/chosen: -40.1868

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 150
- training_steps: 7197

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.2713        | 0.21  | 500  | 0.3576          | -0.9589        | -2.8806          | 0.8417             | 1.9217          | -300.2507      | -247.4370    | -34.9635        | -36.2514      |
| 0.2605        | 0.42  | 1000 | 0.2876          | -1.8668        | -5.2245          | 0.8708             | 3.3577          | -299.0920      | -247.9165    | -39.8673        | -41.1403      |
| 0.134         | 0.63  | 1500 | 0.2827          | -3.3220        | -8.2599          | 0.8833             | 4.9379          | -301.8662      | -250.6212    | -38.4289        | -39.6488      |
| 0.2246        | 0.83  | 2000 | 0.2412          | -3.0672        | -9.5366          | 0.9000             | 6.4694          | -297.1335      | -246.0230    | -36.9979        | -38.2478      |
| 0.0612        | 1.04  | 2500 | 0.2382          | -4.4276        | -12.4767         | 0.9062             | 8.0491          | -298.9408      | -247.7763    | -38.3549        | -39.5684      |
| 0.2336        | 1.25  | 3000 | 0.2628          | -5.5352        | -15.3372         | 0.9042             | 9.8020          | -299.9716      | -248.3611    | -39.0799        | -40.3999      |
| 0.1755        | 1.46  | 3500 | 0.2670          | -6.0750        | -18.0326         | 0.9229             | 11.9576         | -300.3778      | -247.6266    | -38.3635        | -39.7127      |
| 0.34          | 1.67  | 4000 | 0.2499          | -7.2657        | -20.1377         | 0.9208             | 12.8719         | -299.6307      | -248.2345    | -38.0993        | -39.2549      |
| 0.1822        | 1.88  | 4500 | 0.3000          | -7.9584        | -22.7421         | 0.9271             | 14.7838         | -299.8409      | -247.9176    | -38.7806        | -39.9153      |
| 0.153         | 2.08  | 5000 | 0.2972          | -9.4217        | -26.8046         | 0.9333             | 17.3829         | -302.0991      | -248.7675    | -38.2977        | -39.5006      |
| 0.0004        | 2.29  | 5500 | 0.2962          | -9.6704        | -28.5833         | 0.9354             | 18.9129         | -300.9727      | -247.8805    | -38.6801        | -39.9033      |
| 0.0584        | 2.5   | 6000 | 0.3113          | -11.3462       | -31.8850         | 0.9375             | 20.5388         | -301.8552      | -248.8479    | -38.5484        | -39.7563      |
| 0.0304        | 2.71  | 6500 | 0.3441          | -12.4687       | -34.7986         | 0.9354             | 22.3299         | -302.1741      | -249.0562    | -38.8388        | -40.0519      |
| 0.223         | 2.92  | 7000 | 0.3482          | -13.2925       | -37.2767         | 0.9354             | 23.9842         | -302.0002      | -248.9281    | -38.9773        | -40.1868      |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.15.0
- Tokenizers 0.15.0