hugodk-sch commited on
Commit
53f7f4a
1 Parent(s): e81e7c6

Model save

Browse files
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: norallm/normistral-7b-warm
9
+ model-index:
10
+ - name: ap-normistral-7b-align-scan
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # ap-normistral-7b-align-scan
18
+
19
+ This model is a fine-tuned version of [norallm/normistral-7b-warm](https://huggingface.co/norallm/normistral-7b-warm) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.7249
22
+ - Rewards/chosen: -0.1096
23
+ - Rewards/rejected: -0.2129
24
+ - Rewards/accuracies: 0.5282
25
+ - Rewards/margins: 0.1033
26
+ - Logps/rejected: -36.3214
27
+ - Logps/chosen: -32.6259
28
+ - Logits/rejected: 98.6240
29
+ - Logits/chosen: 98.6482
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 4
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 8
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 1
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.6845 | 0.26 | 100 | 0.7399 | 0.0002 | -0.0442 | 0.5245 | 0.0444 | -36.0401 | -32.4428 | 98.7048 | 98.7141 |
65
+ | 0.6115 | 0.52 | 200 | 0.7253 | -0.1275 | -0.2129 | 0.5303 | 0.0854 | -36.3214 | -32.6557 | 98.6043 | 98.6285 |
66
+ | 0.5545 | 0.78 | 300 | 0.7249 | -0.1096 | -0.2129 | 0.5282 | 0.1033 | -36.3214 | -32.6259 | 98.6240 | 98.6482 |
67
+
68
+
69
+ ### Framework versions
70
+
71
+ - PEFT 0.10.0
72
+ - Transformers 4.39.0.dev0
73
+ - Pytorch 2.1.2+cu121
74
+ - Datasets 2.14.6
75
+ - Tokenizers 0.15.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c4bcffb7adb9cb680f0e234f3922b9190a8ea247e05277addbd6dfef58038cd5
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e1c4fb40cab2ed298164fbfba66ecca63a7920ee8c467f1203b62898385f2bc
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5886486524111265,
4
+ "train_runtime": 2556.9439,
5
+ "train_samples": 3079,
6
+ "train_samples_per_second": 1.204,
7
+ "train_steps_per_second": 0.151
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5886486524111265,
4
+ "train_runtime": 2556.9439,
5
+ "train_samples": 3079,
6
+ "train_samples_per_second": 1.204,
7
+ "train_steps_per_second": 0.151
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,663 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 100,
6
+ "global_step": 385,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "grad_norm": 39.25,
14
+ "learning_rate": 1.282051282051282e-07,
15
+ "logits/chosen": 88.18099975585938,
16
+ "logits/rejected": 88.25153350830078,
17
+ "logps/chosen": -29.073104858398438,
18
+ "logps/rejected": -26.25731658935547,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.0,
21
+ "rewards/chosen": 0.0,
22
+ "rewards/margins": 0.0,
23
+ "rewards/rejected": 0.0,
24
+ "step": 1
25
+ },
26
+ {
27
+ "epoch": 0.03,
28
+ "grad_norm": 37.75,
29
+ "learning_rate": 1.282051282051282e-06,
30
+ "logits/chosen": 81.07136535644531,
31
+ "logits/rejected": 80.77804565429688,
32
+ "logps/chosen": -34.25458526611328,
33
+ "logps/rejected": -33.03440475463867,
34
+ "loss": 0.699,
35
+ "rewards/accuracies": 0.4444444477558136,
36
+ "rewards/chosen": -0.007714875973761082,
37
+ "rewards/margins": 0.03788409009575844,
38
+ "rewards/rejected": -0.045598965138196945,
39
+ "step": 10
40
+ },
41
+ {
42
+ "epoch": 0.05,
43
+ "grad_norm": 26.25,
44
+ "learning_rate": 2.564102564102564e-06,
45
+ "logits/chosen": 80.65422058105469,
46
+ "logits/rejected": 80.54401397705078,
47
+ "logps/chosen": -33.63849639892578,
48
+ "logps/rejected": -30.794116973876953,
49
+ "loss": 0.708,
50
+ "rewards/accuracies": 0.512499988079071,
51
+ "rewards/chosen": 0.030845394358038902,
52
+ "rewards/margins": 0.04082341492176056,
53
+ "rewards/rejected": -0.009978031739592552,
54
+ "step": 20
55
+ },
56
+ {
57
+ "epoch": 0.08,
58
+ "grad_norm": 38.25,
59
+ "learning_rate": 3.846153846153847e-06,
60
+ "logits/chosen": 82.5073013305664,
61
+ "logits/rejected": 82.5381088256836,
62
+ "logps/chosen": -33.88646697998047,
63
+ "logps/rejected": -31.181421279907227,
64
+ "loss": 0.7746,
65
+ "rewards/accuracies": 0.44999998807907104,
66
+ "rewards/chosen": 0.07581041753292084,
67
+ "rewards/margins": -0.06963472068309784,
68
+ "rewards/rejected": 0.14544512331485748,
69
+ "step": 30
70
+ },
71
+ {
72
+ "epoch": 0.1,
73
+ "grad_norm": 31.625,
74
+ "learning_rate": 4.999896948438434e-06,
75
+ "logits/chosen": 81.06532287597656,
76
+ "logits/rejected": 81.06108093261719,
77
+ "logps/chosen": -32.81906509399414,
78
+ "logps/rejected": -33.26140594482422,
79
+ "loss": 0.6847,
80
+ "rewards/accuracies": 0.574999988079071,
81
+ "rewards/chosen": 0.21299926936626434,
82
+ "rewards/margins": 0.14872975647449493,
83
+ "rewards/rejected": 0.0642695277929306,
84
+ "step": 40
85
+ },
86
+ {
87
+ "epoch": 0.13,
88
+ "grad_norm": 23.0,
89
+ "learning_rate": 4.987541037542187e-06,
90
+ "logits/chosen": 78.69737243652344,
91
+ "logits/rejected": 78.7103500366211,
92
+ "logps/chosen": -30.65850257873535,
93
+ "logps/rejected": -30.81766128540039,
94
+ "loss": 0.6962,
95
+ "rewards/accuracies": 0.6000000238418579,
96
+ "rewards/chosen": 0.3280490040779114,
97
+ "rewards/margins": 0.17467446625232697,
98
+ "rewards/rejected": 0.1533745527267456,
99
+ "step": 50
100
+ },
101
+ {
102
+ "epoch": 0.16,
103
+ "grad_norm": 31.625,
104
+ "learning_rate": 4.954691471941119e-06,
105
+ "logits/chosen": 83.20633697509766,
106
+ "logits/rejected": 83.25883483886719,
107
+ "logps/chosen": -30.961681365966797,
108
+ "logps/rejected": -29.538171768188477,
109
+ "loss": 0.703,
110
+ "rewards/accuracies": 0.550000011920929,
111
+ "rewards/chosen": 0.12808682024478912,
112
+ "rewards/margins": 0.09667714685201645,
113
+ "rewards/rejected": 0.03140967711806297,
114
+ "step": 60
115
+ },
116
+ {
117
+ "epoch": 0.18,
118
+ "grad_norm": 53.25,
119
+ "learning_rate": 4.901618883413549e-06,
120
+ "logits/chosen": 83.81951141357422,
121
+ "logits/rejected": 83.84638977050781,
122
+ "logps/chosen": -30.67291259765625,
123
+ "logps/rejected": -33.11872482299805,
124
+ "loss": 0.755,
125
+ "rewards/accuracies": 0.4749999940395355,
126
+ "rewards/chosen": -0.026334354653954506,
127
+ "rewards/margins": 0.02227923832833767,
128
+ "rewards/rejected": -0.04861358925700188,
129
+ "step": 70
130
+ },
131
+ {
132
+ "epoch": 0.21,
133
+ "grad_norm": 31.75,
134
+ "learning_rate": 4.828760511501322e-06,
135
+ "logits/chosen": 81.4664306640625,
136
+ "logits/rejected": 81.44920349121094,
137
+ "logps/chosen": -31.316049575805664,
138
+ "logps/rejected": -31.0085391998291,
139
+ "loss": 0.6446,
140
+ "rewards/accuracies": 0.550000011920929,
141
+ "rewards/chosen": 0.11333731561899185,
142
+ "rewards/margins": 0.2638704478740692,
143
+ "rewards/rejected": -0.15053315460681915,
144
+ "step": 80
145
+ },
146
+ {
147
+ "epoch": 0.23,
148
+ "grad_norm": 37.0,
149
+ "learning_rate": 4.7367166013034295e-06,
150
+ "logits/chosen": 78.19766998291016,
151
+ "logits/rejected": 78.16535186767578,
152
+ "logps/chosen": -32.48051071166992,
153
+ "logps/rejected": -31.223648071289062,
154
+ "loss": 0.6567,
155
+ "rewards/accuracies": 0.6000000238418579,
156
+ "rewards/chosen": 0.09460089355707169,
157
+ "rewards/margins": 0.2579067349433899,
158
+ "rewards/rejected": -0.1633058488368988,
159
+ "step": 90
160
+ },
161
+ {
162
+ "epoch": 0.26,
163
+ "grad_norm": 31.5,
164
+ "learning_rate": 4.626245458345211e-06,
165
+ "logits/chosen": 83.43191528320312,
166
+ "logits/rejected": 83.45047760009766,
167
+ "logps/chosen": -34.02558135986328,
168
+ "logps/rejected": -31.787883758544922,
169
+ "loss": 0.6845,
170
+ "rewards/accuracies": 0.574999988079071,
171
+ "rewards/chosen": 0.16764816641807556,
172
+ "rewards/margins": 0.1900513470172882,
173
+ "rewards/rejected": -0.022403212264180183,
174
+ "step": 100
175
+ },
176
+ {
177
+ "epoch": 0.26,
178
+ "eval_logits/chosen": 98.71414947509766,
179
+ "eval_logits/rejected": 98.70475769042969,
180
+ "eval_logps/chosen": -32.44282531738281,
181
+ "eval_logps/rejected": -36.040138244628906,
182
+ "eval_loss": 0.7398820519447327,
183
+ "eval_rewards/accuracies": 0.5245016813278198,
184
+ "eval_rewards/chosen": 0.00021068855130579323,
185
+ "eval_rewards/margins": 0.04437926039099693,
186
+ "eval_rewards/rejected": -0.04416857287287712,
187
+ "eval_runtime": 104.2075,
188
+ "eval_samples_per_second": 3.292,
189
+ "eval_steps_per_second": 0.413,
190
+ "step": 100
191
+ },
192
+ {
193
+ "epoch": 0.29,
194
+ "grad_norm": 40.25,
195
+ "learning_rate": 4.498257201263691e-06,
196
+ "logits/chosen": 83.59847259521484,
197
+ "logits/rejected": 83.49092102050781,
198
+ "logps/chosen": -32.43052673339844,
199
+ "logps/rejected": -32.78325271606445,
200
+ "loss": 0.6135,
201
+ "rewards/accuracies": 0.6499999761581421,
202
+ "rewards/chosen": 0.3553674817085266,
203
+ "rewards/margins": 0.43178611993789673,
204
+ "rewards/rejected": -0.07641863822937012,
205
+ "step": 110
206
+ },
207
+ {
208
+ "epoch": 0.31,
209
+ "grad_norm": 46.5,
210
+ "learning_rate": 4.353806263777678e-06,
211
+ "logits/chosen": 83.7637710571289,
212
+ "logits/rejected": 83.87000274658203,
213
+ "logps/chosen": -28.259990692138672,
214
+ "logps/rejected": -35.35393524169922,
215
+ "loss": 0.6375,
216
+ "rewards/accuracies": 0.612500011920929,
217
+ "rewards/chosen": 0.40175461769104004,
218
+ "rewards/margins": 0.33862805366516113,
219
+ "rewards/rejected": 0.06312654912471771,
220
+ "step": 120
221
+ },
222
+ {
223
+ "epoch": 0.34,
224
+ "grad_norm": 24.875,
225
+ "learning_rate": 4.1940827077152755e-06,
226
+ "logits/chosen": 80.89453125,
227
+ "logits/rejected": 80.9158706665039,
228
+ "logps/chosen": -30.432043075561523,
229
+ "logps/rejected": -32.080535888671875,
230
+ "loss": 0.6294,
231
+ "rewards/accuracies": 0.675000011920929,
232
+ "rewards/chosen": 0.2851874530315399,
233
+ "rewards/margins": 0.3745357096195221,
234
+ "rewards/rejected": -0.08934825658798218,
235
+ "step": 130
236
+ },
237
+ {
238
+ "epoch": 0.36,
239
+ "grad_norm": 25.5,
240
+ "learning_rate": 4.0204024186666215e-06,
241
+ "logits/chosen": 82.0683822631836,
242
+ "logits/rejected": 82.07270812988281,
243
+ "logps/chosen": -27.02596092224121,
244
+ "logps/rejected": -33.121150970458984,
245
+ "loss": 0.5365,
246
+ "rewards/accuracies": 0.7250000238418579,
247
+ "rewards/chosen": 0.2528177499771118,
248
+ "rewards/margins": 0.6714814305305481,
249
+ "rewards/rejected": -0.4186636805534363,
250
+ "step": 140
251
+ },
252
+ {
253
+ "epoch": 0.39,
254
+ "grad_norm": 25.375,
255
+ "learning_rate": 3.834196265035119e-06,
256
+ "logits/chosen": 80.59815979003906,
257
+ "logits/rejected": 80.57023620605469,
258
+ "logps/chosen": -28.871845245361328,
259
+ "logps/rejected": -33.09119415283203,
260
+ "loss": 0.5456,
261
+ "rewards/accuracies": 0.699999988079071,
262
+ "rewards/chosen": 0.31036004424095154,
263
+ "rewards/margins": 0.6251744627952576,
264
+ "rewards/rejected": -0.3148145079612732,
265
+ "step": 150
266
+ },
267
+ {
268
+ "epoch": 0.42,
269
+ "grad_norm": 44.25,
270
+ "learning_rate": 3.636998309800573e-06,
271
+ "logits/chosen": 82.46113586425781,
272
+ "logits/rejected": 82.46646118164062,
273
+ "logps/chosen": -33.629737854003906,
274
+ "logps/rejected": -30.432525634765625,
275
+ "loss": 0.6101,
276
+ "rewards/accuracies": 0.737500011920929,
277
+ "rewards/chosen": 0.30420786142349243,
278
+ "rewards/margins": 0.5921996235847473,
279
+ "rewards/rejected": -0.2879917025566101,
280
+ "step": 160
281
+ },
282
+ {
283
+ "epoch": 0.44,
284
+ "grad_norm": 33.0,
285
+ "learning_rate": 3.4304331721118078e-06,
286
+ "logits/chosen": 83.26214599609375,
287
+ "logits/rejected": 83.21092224121094,
288
+ "logps/chosen": -30.77018165588379,
289
+ "logps/rejected": -32.57013702392578,
290
+ "loss": 0.573,
291
+ "rewards/accuracies": 0.6875,
292
+ "rewards/chosen": 0.2934645712375641,
293
+ "rewards/margins": 0.6235076189041138,
294
+ "rewards/rejected": -0.33004307746887207,
295
+ "step": 170
296
+ },
297
+ {
298
+ "epoch": 0.47,
299
+ "grad_norm": 27.125,
300
+ "learning_rate": 3.2162026428305436e-06,
301
+ "logits/chosen": 80.83445739746094,
302
+ "logits/rejected": 80.81375885009766,
303
+ "logps/chosen": -30.401935577392578,
304
+ "logps/rejected": -31.623117446899414,
305
+ "loss": 0.5116,
306
+ "rewards/accuracies": 0.737500011920929,
307
+ "rewards/chosen": 0.4771292805671692,
308
+ "rewards/margins": 0.7566738724708557,
309
+ "rewards/rejected": -0.2795446515083313,
310
+ "step": 180
311
+ },
312
+ {
313
+ "epoch": 0.49,
314
+ "grad_norm": 14.0,
315
+ "learning_rate": 2.996071664294641e-06,
316
+ "logits/chosen": 82.55574035644531,
317
+ "logits/rejected": 82.5384521484375,
318
+ "logps/chosen": -30.206974029541016,
319
+ "logps/rejected": -30.71441078186035,
320
+ "loss": 0.6219,
321
+ "rewards/accuracies": 0.6499999761581421,
322
+ "rewards/chosen": 0.3356670141220093,
323
+ "rewards/margins": 0.4834977686405182,
324
+ "rewards/rejected": -0.14783072471618652,
325
+ "step": 190
326
+ },
327
+ {
328
+ "epoch": 0.52,
329
+ "grad_norm": 15.375,
330
+ "learning_rate": 2.7718537898066833e-06,
331
+ "logits/chosen": 78.06065368652344,
332
+ "logits/rejected": 78.00971984863281,
333
+ "logps/chosen": -33.789581298828125,
334
+ "logps/rejected": -32.68096923828125,
335
+ "loss": 0.6115,
336
+ "rewards/accuracies": 0.6625000238418579,
337
+ "rewards/chosen": 0.577893853187561,
338
+ "rewards/margins": 0.690432071685791,
339
+ "rewards/rejected": -0.11253812164068222,
340
+ "step": 200
341
+ },
342
+ {
343
+ "epoch": 0.52,
344
+ "eval_logits/chosen": 98.62848663330078,
345
+ "eval_logits/rejected": 98.60428619384766,
346
+ "eval_logps/chosen": -32.65570068359375,
347
+ "eval_logps/rejected": -36.321441650390625,
348
+ "eval_loss": 0.7252821922302246,
349
+ "eval_rewards/accuracies": 0.530315637588501,
350
+ "eval_rewards/chosen": -0.12751542031764984,
351
+ "eval_rewards/margins": 0.08543363958597183,
352
+ "eval_rewards/rejected": -0.2129490226507187,
353
+ "eval_runtime": 103.8957,
354
+ "eval_samples_per_second": 3.301,
355
+ "eval_steps_per_second": 0.414,
356
+ "step": 200
357
+ },
358
+ {
359
+ "epoch": 0.55,
360
+ "grad_norm": 52.0,
361
+ "learning_rate": 2.5453962426402006e-06,
362
+ "logits/chosen": 80.63914489746094,
363
+ "logits/rejected": 80.54652404785156,
364
+ "logps/chosen": -33.34014129638672,
365
+ "logps/rejected": -35.32052230834961,
366
+ "loss": 0.5935,
367
+ "rewards/accuracies": 0.737500011920929,
368
+ "rewards/chosen": 0.3633476793766022,
369
+ "rewards/margins": 0.5640031099319458,
370
+ "rewards/rejected": -0.20065537095069885,
371
+ "step": 210
372
+ },
373
+ {
374
+ "epoch": 0.57,
375
+ "grad_norm": 19.625,
376
+ "learning_rate": 2.3185646976551794e-06,
377
+ "logits/chosen": 82.76437377929688,
378
+ "logits/rejected": 82.84717559814453,
379
+ "logps/chosen": -31.025707244873047,
380
+ "logps/rejected": -31.30951499938965,
381
+ "loss": 0.5027,
382
+ "rewards/accuracies": 0.762499988079071,
383
+ "rewards/chosen": 0.553946852684021,
384
+ "rewards/margins": 0.9022806286811829,
385
+ "rewards/rejected": -0.3483339250087738,
386
+ "step": 220
387
+ },
388
+ {
389
+ "epoch": 0.6,
390
+ "grad_norm": 32.75,
391
+ "learning_rate": 2.0932279108998323e-06,
392
+ "logits/chosen": 79.89958190917969,
393
+ "logits/rejected": 79.95211791992188,
394
+ "logps/chosen": -32.34553146362305,
395
+ "logps/rejected": -34.391754150390625,
396
+ "loss": 0.6272,
397
+ "rewards/accuracies": 0.625,
398
+ "rewards/chosen": 0.2761251628398895,
399
+ "rewards/margins": 0.5038853287696838,
400
+ "rewards/rejected": -0.2277601659297943,
401
+ "step": 230
402
+ },
403
+ {
404
+ "epoch": 0.62,
405
+ "grad_norm": 35.5,
406
+ "learning_rate": 1.8712423238279358e-06,
407
+ "logits/chosen": 82.25331115722656,
408
+ "logits/rejected": 82.53690338134766,
409
+ "logps/chosen": -30.6766357421875,
410
+ "logps/rejected": -31.96030044555664,
411
+ "loss": 0.4539,
412
+ "rewards/accuracies": 0.8125,
413
+ "rewards/chosen": 0.6068586111068726,
414
+ "rewards/margins": 0.8638145327568054,
415
+ "rewards/rejected": -0.25695592164993286,
416
+ "step": 240
417
+ },
418
+ {
419
+ "epoch": 0.65,
420
+ "grad_norm": 30.5,
421
+ "learning_rate": 1.6544367689701824e-06,
422
+ "logits/chosen": 80.93089294433594,
423
+ "logits/rejected": 80.99276733398438,
424
+ "logps/chosen": -27.04372787475586,
425
+ "logps/rejected": -30.084264755249023,
426
+ "loss": 0.6593,
427
+ "rewards/accuracies": 0.574999988079071,
428
+ "rewards/chosen": 0.3313008248806,
429
+ "rewards/margins": 0.441417396068573,
430
+ "rewards/rejected": -0.11011654138565063,
431
+ "step": 250
432
+ },
433
+ {
434
+ "epoch": 0.68,
435
+ "grad_norm": 29.125,
436
+ "learning_rate": 1.4445974030621963e-06,
437
+ "logits/chosen": 78.20941162109375,
438
+ "logits/rejected": 78.33964538574219,
439
+ "logps/chosen": -30.433767318725586,
440
+ "logps/rejected": -36.57436752319336,
441
+ "loss": 0.5,
442
+ "rewards/accuracies": 0.7250000238418579,
443
+ "rewards/chosen": 0.6763362884521484,
444
+ "rewards/margins": 0.9599828720092773,
445
+ "rewards/rejected": -0.28364673256874084,
446
+ "step": 260
447
+ },
448
+ {
449
+ "epoch": 0.7,
450
+ "grad_norm": 21.5,
451
+ "learning_rate": 1.243452991757889e-06,
452
+ "logits/chosen": 77.5750503540039,
453
+ "logits/rejected": 77.60489654541016,
454
+ "logps/chosen": -30.800561904907227,
455
+ "logps/rejected": -31.87221908569336,
456
+ "loss": 0.4973,
457
+ "rewards/accuracies": 0.762499988079071,
458
+ "rewards/chosen": 0.5870199203491211,
459
+ "rewards/margins": 0.807098388671875,
460
+ "rewards/rejected": -0.2200784683227539,
461
+ "step": 270
462
+ },
463
+ {
464
+ "epoch": 0.73,
465
+ "grad_norm": 33.25,
466
+ "learning_rate": 1.0526606671603523e-06,
467
+ "logits/chosen": 80.28849029541016,
468
+ "logits/rejected": 80.06718444824219,
469
+ "logps/chosen": -31.078380584716797,
470
+ "logps/rejected": -29.8966007232666,
471
+ "loss": 0.5973,
472
+ "rewards/accuracies": 0.7250000238418579,
473
+ "rewards/chosen": 0.4389079213142395,
474
+ "rewards/margins": 0.5766692757606506,
475
+ "rewards/rejected": -0.13776138424873352,
476
+ "step": 280
477
+ },
478
+ {
479
+ "epoch": 0.75,
480
+ "grad_norm": 17.75,
481
+ "learning_rate": 8.737922755071455e-07,
482
+ "logits/chosen": 80.41847229003906,
483
+ "logits/rejected": 80.33303833007812,
484
+ "logps/chosen": -32.99018478393555,
485
+ "logps/rejected": -32.6365966796875,
486
+ "loss": 0.4458,
487
+ "rewards/accuracies": 0.762499988079071,
488
+ "rewards/chosen": 0.6684367656707764,
489
+ "rewards/margins": 1.0402761697769165,
490
+ "rewards/rejected": -0.37183937430381775,
491
+ "step": 290
492
+ },
493
+ {
494
+ "epoch": 0.78,
495
+ "grad_norm": 34.25,
496
+ "learning_rate": 7.08321427484816e-07,
497
+ "logits/chosen": 76.02632141113281,
498
+ "logits/rejected": 76.11949920654297,
499
+ "logps/chosen": -32.25402069091797,
500
+ "logps/rejected": -29.283954620361328,
501
+ "loss": 0.5545,
502
+ "rewards/accuracies": 0.699999988079071,
503
+ "rewards/chosen": 0.6910130381584167,
504
+ "rewards/margins": 0.8001711964607239,
505
+ "rewards/rejected": -0.10915807634592056,
506
+ "step": 300
507
+ },
508
+ {
509
+ "epoch": 0.78,
510
+ "eval_logits/chosen": 98.64820861816406,
511
+ "eval_logits/rejected": 98.62397003173828,
512
+ "eval_logps/chosen": -32.62586212158203,
513
+ "eval_logps/rejected": -36.32143783569336,
514
+ "eval_loss": 0.7248644828796387,
515
+ "eval_rewards/accuracies": 0.5282392501831055,
516
+ "eval_rewards/chosen": -0.10961288958787918,
517
+ "eval_rewards/margins": 0.10333485901355743,
518
+ "eval_rewards/rejected": -0.2129477560520172,
519
+ "eval_runtime": 104.0116,
520
+ "eval_samples_per_second": 3.298,
521
+ "eval_steps_per_second": 0.413,
522
+ "step": 300
523
+ },
524
+ {
525
+ "epoch": 0.81,
526
+ "grad_norm": 27.625,
527
+ "learning_rate": 5.576113578589035e-07,
528
+ "logits/chosen": 83.13574981689453,
529
+ "logits/rejected": 83.16615295410156,
530
+ "logps/chosen": -29.959243774414062,
531
+ "logps/rejected": -32.55767059326172,
532
+ "loss": 0.5265,
533
+ "rewards/accuracies": 0.737500011920929,
534
+ "rewards/chosen": 0.542574405670166,
535
+ "rewards/margins": 0.7573197484016418,
536
+ "rewards/rejected": -0.21474528312683105,
537
+ "step": 310
538
+ },
539
+ {
540
+ "epoch": 0.83,
541
+ "grad_norm": 21.625,
542
+ "learning_rate": 4.229036944380913e-07,
543
+ "logits/chosen": 80.65809631347656,
544
+ "logits/rejected": 80.65727233886719,
545
+ "logps/chosen": -30.505443572998047,
546
+ "logps/rejected": -29.11099624633789,
547
+ "loss": 0.5087,
548
+ "rewards/accuracies": 0.737500011920929,
549
+ "rewards/chosen": 0.6558700799942017,
550
+ "rewards/margins": 0.7707726359367371,
551
+ "rewards/rejected": -0.11490253359079361,
552
+ "step": 320
553
+ },
554
+ {
555
+ "epoch": 0.86,
556
+ "grad_norm": 19.25,
557
+ "learning_rate": 3.053082288996112e-07,
558
+ "logits/chosen": 77.81159210205078,
559
+ "logits/rejected": 77.86034393310547,
560
+ "logps/chosen": -29.130138397216797,
561
+ "logps/rejected": -33.010986328125,
562
+ "loss": 0.4483,
563
+ "rewards/accuracies": 0.75,
564
+ "rewards/chosen": 0.7334806323051453,
565
+ "rewards/margins": 0.9591558575630188,
566
+ "rewards/rejected": -0.22567513585090637,
567
+ "step": 330
568
+ },
569
+ {
570
+ "epoch": 0.88,
571
+ "grad_norm": 41.75,
572
+ "learning_rate": 2.0579377374915805e-07,
573
+ "logits/chosen": 82.1180648803711,
574
+ "logits/rejected": 82.14155578613281,
575
+ "logps/chosen": -32.119606018066406,
576
+ "logps/rejected": -33.77212905883789,
577
+ "loss": 0.5073,
578
+ "rewards/accuracies": 0.737500011920929,
579
+ "rewards/chosen": 0.6555252075195312,
580
+ "rewards/margins": 0.8975871801376343,
581
+ "rewards/rejected": -0.24206197261810303,
582
+ "step": 340
583
+ },
584
+ {
585
+ "epoch": 0.91,
586
+ "grad_norm": 17.25,
587
+ "learning_rate": 1.2518018074041684e-07,
588
+ "logits/chosen": 81.12958526611328,
589
+ "logits/rejected": 81.1399154663086,
590
+ "logps/chosen": -32.4399299621582,
591
+ "logps/rejected": -33.30702590942383,
592
+ "loss": 0.5483,
593
+ "rewards/accuracies": 0.7250000238418579,
594
+ "rewards/chosen": 0.7222862839698792,
595
+ "rewards/margins": 0.8514213562011719,
596
+ "rewards/rejected": -0.12913502752780914,
597
+ "step": 350
598
+ },
599
+ {
600
+ "epoch": 0.94,
601
+ "grad_norm": 24.875,
602
+ "learning_rate": 6.41315865106129e-08,
603
+ "logits/chosen": 82.61198425292969,
604
+ "logits/rejected": 82.64558410644531,
605
+ "logps/chosen": -28.419490814208984,
606
+ "logps/rejected": -31.76764488220215,
607
+ "loss": 0.5254,
608
+ "rewards/accuracies": 0.6875,
609
+ "rewards/chosen": 0.6796320080757141,
610
+ "rewards/margins": 0.7503107786178589,
611
+ "rewards/rejected": -0.07067875564098358,
612
+ "step": 360
613
+ },
614
+ {
615
+ "epoch": 0.96,
616
+ "grad_norm": 30.25,
617
+ "learning_rate": 2.3150941078050325e-08,
618
+ "logits/chosen": 82.08049774169922,
619
+ "logits/rejected": 82.0997543334961,
620
+ "logps/chosen": -31.871307373046875,
621
+ "logps/rejected": -35.636024475097656,
622
+ "loss": 0.5176,
623
+ "rewards/accuracies": 0.7250000238418579,
624
+ "rewards/chosen": 0.6029146909713745,
625
+ "rewards/margins": 0.9189049601554871,
626
+ "rewards/rejected": -0.31599029898643494,
627
+ "step": 370
628
+ },
629
+ {
630
+ "epoch": 0.99,
631
+ "grad_norm": 31.875,
632
+ "learning_rate": 2.575864278703266e-09,
633
+ "logits/chosen": 75.98027038574219,
634
+ "logits/rejected": 75.85136413574219,
635
+ "logps/chosen": -29.75612449645996,
636
+ "logps/rejected": -28.387653350830078,
637
+ "loss": 0.5513,
638
+ "rewards/accuracies": 0.699999988079071,
639
+ "rewards/chosen": 0.49200135469436646,
640
+ "rewards/margins": 0.6282498836517334,
641
+ "rewards/rejected": -0.13624855875968933,
642
+ "step": 380
643
+ },
644
+ {
645
+ "epoch": 1.0,
646
+ "step": 385,
647
+ "total_flos": 0.0,
648
+ "train_loss": 0.5886486524111265,
649
+ "train_runtime": 2556.9439,
650
+ "train_samples_per_second": 1.204,
651
+ "train_steps_per_second": 0.151
652
+ }
653
+ ],
654
+ "logging_steps": 10,
655
+ "max_steps": 385,
656
+ "num_input_tokens_seen": 0,
657
+ "num_train_epochs": 1,
658
+ "save_steps": 100,
659
+ "total_flos": 0.0,
660
+ "train_batch_size": 4,
661
+ "trial_name": null,
662
+ "trial_params": null
663
+ }