Safetensors
llama
alignment-handbook
trl
dpo
Generated from Trainer
File size: 14,535 Bytes
a3f00a6
 
 
 
2db4660
a3f00a6
 
 
2db4660
 
 
 
 
 
 
a3f00a6
 
 
 
 
 
 
 
 
 
 
2db4660
a3f00a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: other
base_model: deepseek-ai/deepseek-llm-7b-chat
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- self-generate/ds_chat_original_cn_mining_oj_iter0-binarized
- self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized
- self-generate/ds_chat_original_cn_rl_oj_iter0-binarized
model-index:
- name: ds_chat_sppo_hard_iter0_2024-09-15-01.39
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://ml.byteintl.net/experiment/tracking/detail?Id=project_20240915_20321b8f&selectedTrial=run_20240915_fdcd3e5b)
# ds_chat_sppo_hard_iter0_2024-09-15-01.39

This model is a fine-tuned version of [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat) on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets.
It achieves the following results on the evaluation set:
- Loss: 4624.1011
- Rewards/chosen: 0.0051
- Rewards/rejected: -0.0370
- Rewards/accuracies: 0.5789
- Rewards/margins: 0.0421
- Logps/rejected: -263.3607
- Logps/chosen: -252.4096
- Logits/rejected: 1.4404
- Logits/chosen: 1.3959
- Debug/policy Chosen Logits: 1.3959
- Debug/policy Rejected Logits: 1.4404
- Debug/policy Chosen Logps: -252.4096
- Debug/policy Rejected Logps: -263.3607
- Debug/reference Chosen Logps: -252.9185
- Debug/reference Rejected Logps: -259.6586
- Debug/sppo Chosen Reward In Loss: 0.5089
- Debug/sppo Rej Reward In Loss: -3.7021
- Debug/sppo Chosen Loss: 2526.5620
- Debug/sppo Reject Loss: 2309.3242

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 100
- num_epochs: 8.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Chosen Logits | Debug/policy Rejected Logits | Debug/policy Chosen Logps | Debug/policy Rejected Logps | Debug/reference Chosen Logps | Debug/reference Rejected Logps | Debug/sppo Chosen Reward In Loss | Debug/sppo Rej Reward In Loss | Debug/sppo Chosen Loss | Debug/sppo Reject Loss |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------------:|:----------------------------:|:-------------------------:|:---------------------------:|:----------------------------:|:------------------------------:|:--------------------------------:|:-----------------------------:|:----------------------:|:----------------------:|
| 4975.3273     | 0.3623 | 100  | 4981.6489       | -0.0033        | -0.0038          | 0.4605             | 0.0004          | -260.0373      | -253.2532    | 1.7010          | 1.6372        | 1.6372                     | 1.7010                       | -253.2532                 | -260.0373                   | -252.9185                    | -259.6586                      | -0.3347                          | -0.3786                       | 2534.3679              | 2463.3860              |
| 4930.2141     | 0.7246 | 200  | 4924.0649       | -0.0013        | -0.0060          | 0.5789             | 0.0047          | -260.2596      | -253.0476    | 1.6680          | 1.6070        | 1.6070                     | 1.6680                       | -253.0476                 | -260.2596                   | -252.9185                    | -259.6586                      | -0.1291                          | -0.6009                       | 2514.6309              | 2444.3210              |
| 4841.2859     | 1.0870 | 300  | 4866.0864       | -0.0095        | -0.0185          | 0.5395             | 0.0089          | -261.5047      | -253.8716    | 1.6500          | 1.5926        | 1.5926                     | 1.6500                       | -253.8716                 | -261.5047                   | -252.9185                    | -259.6586                      | -0.9531                          | -1.8460                       | 2603.5461              | 2331.7520              |
| 4822.266      | 1.4493 | 400  | 4827.9761       | -0.0173        | -0.0295          | 0.5395             | 0.0122          | -262.6080      | -254.6497    | 1.6162          | 1.5603        | 1.5603                     | 1.6162                       | -254.6497                 | -262.6080                   | -252.9185                    | -259.6586                      | -1.7313                          | -2.9494                       | 2692.5408              | 2243.4092              |
| 4715.0469     | 1.8116 | 500  | 4771.2051       | -0.0007        | -0.0176          | 0.4868             | 0.0169          | -261.4219      | -252.9887    | 1.5898          | 1.5341        | 1.5341                     | 1.5898                       | -252.9887                 | -261.4219                   | -252.9185                    | -259.6586                      | -0.0703                          | -1.7633                       | 2529.2981              | 2376.3818              |
| 4665.2648     | 2.1739 | 600  | 4749.7798       | 0.0008         | -0.0212          | 0.5395             | 0.0220          | -261.7789      | -252.8382    | 1.5688          | 1.5147        | 1.5147                     | 1.5688                       | -252.8382                 | -261.7789                   | -252.9185                    | -259.6586                      | 0.0803                           | -2.1202                       | 2515.5928              | 2344.7095              |
| 4625.0359     | 2.5362 | 700  | 5035.4683       | 0.0876         | 0.0697           | 0.6447             | 0.0179          | -252.6841      | -244.1548    | 1.5685          | 1.5098        | 1.5098                     | 1.5685                       | -244.1548                 | -252.6841                   | -252.9185                    | -259.6586                      | 8.7637                           | 6.9746                        | 1714.2816              | 3259.7661              |
| 4637.3375     | 2.8986 | 800  | 4705.7749       | -0.0031        | -0.0319          | 0.5921             | 0.0287          | -262.8461      | -253.2311    | 1.5294          | 1.4773        | 1.4773                     | 1.5294                       | -253.2311                 | -262.8461                   | -252.9185                    | -259.6586                      | -0.3127                          | -3.1874                       | 2569.7046              | 2272.2061              |
| 4550.082      | 3.2609 | 900  | 4687.2900       | -0.0001        | -0.0318          | 0.5921             | 0.0317          | -262.8345      | -252.9287    | 1.5160          | 1.4652        | 1.4652                     | 1.5160                       | -252.9287                 | -262.8345                   | -252.9185                    | -259.6586                      | -0.0102                          | -3.1759                       | 2544.3586              | 2288.0042              |
| 4612.343      | 3.6232 | 1000 | 4670.3667       | 0.0005         | -0.0323          | 0.5658             | 0.0328          | -262.8906      | -252.8681    | 1.5061          | 1.4569        | 1.4569                     | 1.5061                       | -252.8681                 | -262.8906                   | -252.9185                    | -259.6586                      | 0.0504                           | -3.2320                       | 2546.7378              | 2296.4641              |
| 4579.3098     | 3.9855 | 1100 | 4676.5903       | -0.0058        | -0.0391          | 0.5263             | 0.0333          | -263.5656      | -253.4963    | 1.5062          | 1.4565        | 1.4565                     | 1.5062                       | -253.4963                 | -263.5656                   | -252.9185                    | -259.6586                      | -0.5778                          | -3.9070                       | 2616.4526              | 2253.1421              |
| 4461.193      | 4.3478 | 1200 | 4657.2646       | 0.0038         | -0.0339          | 0.6053             | 0.0377          | -263.0466      | -252.5387    | 1.4919          | 1.4449        | 1.4449                     | 1.4919                       | -252.5387                 | -263.0466                   | -252.9185                    | -259.6586                      | 0.3798                           | -3.3879                       | 2517.6655              | 2292.2590              |
| 4688.9563     | 4.7101 | 1300 | 4654.3955       | -0.0002        | -0.0373          | 0.5658             | 0.0371          | -263.3885      | -252.9360    | 1.4725          | 1.4244        | 1.4244                     | 1.4725                       | -252.9360                 | -263.3885                   | -252.9185                    | -259.6586                      | -0.0175                          | -3.7298                       | 2567.2290              | 2285.4812              |
| 4572.3969     | 5.0725 | 1400 | 4650.5352       | -0.0014        | -0.0398          | 0.5789             | 0.0384          | -263.6363      | -253.0607    | 1.4663          | 1.4206        | 1.4206                     | 1.4663                       | -253.0607                 | -263.6363                   | -252.9185                    | -259.6586                      | -0.1422                          | -3.9776                       | 2580.2542              | 2263.7637              |
| 4497.8313     | 5.4348 | 1500 | 4637.4077       | 0.0039         | -0.0371          | 0.5658             | 0.0410          | -263.3676      | -252.5313    | 1.4566          | 1.4118        | 1.4118                     | 1.4566                       | -252.5313                 | -263.3676                   | -252.9185                    | -259.6586                      | 0.3872                           | -3.7090                       | 2528.2339              | 2293.6980              |
| 4573.9879     | 5.7971 | 1600 | 4628.5752       | 0.0069         | -0.0333          | 0.5921             | 0.0402          | -262.9847      | -252.2267    | 1.4558          | 1.4099        | 1.4099                     | 1.4558                       | -252.2267                 | -262.9847                   | -252.9185                    | -259.6586                      | 0.6917                           | -3.3261                       | 2501.1956              | 2325.0657              |
| 4493.7113     | 6.1594 | 1700 | 4615.8252       | 0.0106         | -0.0325          | 0.5921             | 0.0431          | -262.9095      | -251.8597    | 1.4488          | 1.4028        | 1.4028                     | 1.4488                       | -251.8597                 | -262.9095                   | -252.9185                    | -259.6586                      | 1.0587                           | -3.2509                       | 2467.5171              | 2344.7961              |
| 4579.916      | 6.5217 | 1800 | 4618.2861       | 0.0059         | -0.0377          | 0.5789             | 0.0436          | -263.4273      | -252.3270    | 1.4455          | 1.4013        | 1.4013                     | 1.4455                       | -252.3270                 | -263.4273                   | -252.9185                    | -259.6586                      | 0.5915                           | -3.7687                       | 2516.5059              | 2301.5999              |
| 4682.2398     | 6.8841 | 1900 | 4613.9302       | 0.0060         | -0.0385          | 0.6184             | 0.0445          | -263.5052      | -252.3165    | 1.4429          | 1.3991        | 1.3991                     | 1.4429                       | -252.3165                 | -263.5052                   | -252.9185                    | -259.6586                      | 0.6019                           | -3.8466                       | 2513.9785              | 2293.4380              |
| 4497.943      | 7.2464 | 2000 | 4617.7402       | 0.0049         | -0.0368          | 0.6053             | 0.0417          | -263.3337      | -252.4285    | 1.4409          | 1.3966        | 1.3966                     | 1.4409                       | -252.4285                 | -263.3337                   | -252.9185                    | -259.6586                      | 0.4900                           | -3.6751                       | 2527.1399              | 2309.4104              |
| 4470.4805     | 7.6087 | 2100 | 4616.2676       | 0.0083         | -0.0372          | 0.6053             | 0.0455          | -263.3792      | -252.0898    | 1.4419          | 1.3983        | 1.3983                     | 1.4419                       | -252.0898                 | -263.3792                   | -252.9185                    | -259.6586                      | 0.8286                           | -3.7205                       | 2493.6099              | 2304.2241              |
| 4514.8016     | 7.9710 | 2200 | 4624.1011       | 0.0051         | -0.0370          | 0.5789             | 0.0421          | -263.3607      | -252.4096    | 1.4404          | 1.3959        | 1.3959                     | 1.4404                       | -252.4096                 | -263.3607                   | -252.9185                    | -259.6586                      | 0.5089                           | -3.7021                       | 2526.5620              | 2309.3242              |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1