martimfasantos
commited on
Commit
•
12d6811
1
Parent(s):
941617c
Model save
Browse files- README.md +113 -0
- all_results.json +9 -0
- generation_config.json +7 -0
- model.safetensors +1 -1
- runs/Jun11_00-56-54_poseidon/events.out.tfevents.1718067780.poseidon.4172683.0 +2 -2
- train_results.json +9 -0
- trainer_state.json +0 -0
README.md
ADDED
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
|
4 |
+
tags:
|
5 |
+
- trl
|
6 |
+
- dpo
|
7 |
+
- generated_from_trainer
|
8 |
+
model-index:
|
9 |
+
- name: tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old
|
10 |
+
results: []
|
11 |
+
---
|
12 |
+
|
13 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
+
should probably proofread and complete it, then remove this comment. -->
|
15 |
+
|
16 |
+
# tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old
|
17 |
+
|
18 |
+
This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on an unknown dataset.
|
19 |
+
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 0.6303
|
21 |
+
- Rewards/chosen: -1.4484
|
22 |
+
- Rewards/rejected: -1.8080
|
23 |
+
- Rewards/accuracies: 0.6436
|
24 |
+
- Rewards/margins: 0.3596
|
25 |
+
- Logps/rejected: -243.9776
|
26 |
+
- Logps/chosen: -203.5508
|
27 |
+
- Logits/rejected: -1.7024
|
28 |
+
- Logits/chosen: -1.7262
|
29 |
+
|
30 |
+
## Model description
|
31 |
+
|
32 |
+
More information needed
|
33 |
+
|
34 |
+
## Intended uses & limitations
|
35 |
+
|
36 |
+
More information needed
|
37 |
+
|
38 |
+
## Training and evaluation data
|
39 |
+
|
40 |
+
More information needed
|
41 |
+
|
42 |
+
## Training procedure
|
43 |
+
|
44 |
+
### Training hyperparameters
|
45 |
+
|
46 |
+
The following hyperparameters were used during training:
|
47 |
+
- learning_rate: 2e-07
|
48 |
+
- train_batch_size: 8
|
49 |
+
- eval_batch_size: 8
|
50 |
+
- seed: 42
|
51 |
+
- distributed_type: multi-GPU
|
52 |
+
- gradient_accumulation_steps: 2
|
53 |
+
- total_train_batch_size: 16
|
54 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
55 |
+
- lr_scheduler_type: cosine
|
56 |
+
- lr_scheduler_warmup_ratio: 0.1
|
57 |
+
- num_epochs: 3
|
58 |
+
|
59 |
+
### Training results
|
60 |
+
|
61 |
+
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
+
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
+
| 0.6931 | 0.0689 | 400 | 0.6932 | 0.0002 | 0.0003 | 0.4654 | -0.0001 | -63.1542 | -58.6924 | -3.1574 | -3.1630 |
|
64 |
+
| 0.692 | 0.1378 | 800 | 0.6928 | 0.0015 | 0.0008 | 0.5525 | 0.0007 | -63.0955 | -58.5586 | -3.1518 | -3.1574 |
|
65 |
+
| 0.6902 | 0.2068 | 1200 | 0.6914 | 0.0009 | -0.0027 | 0.5876 | 0.0037 | -63.4527 | -58.6187 | -3.1281 | -3.1338 |
|
66 |
+
| 0.6835 | 0.2757 | 1600 | 0.6888 | -0.0225 | -0.0320 | 0.5864 | 0.0096 | -66.3833 | -60.9598 | -3.0838 | -3.0895 |
|
67 |
+
| 0.6778 | 0.3446 | 2000 | 0.6845 | -0.0724 | -0.0918 | 0.5976 | 0.0194 | -72.3574 | -65.9486 | -3.0213 | -3.0270 |
|
68 |
+
| 0.6688 | 0.4135 | 2400 | 0.6792 | -0.1403 | -0.1725 | 0.6032 | 0.0323 | -80.4345 | -72.7375 | -2.9370 | -2.9428 |
|
69 |
+
| 0.6675 | 0.4824 | 2800 | 0.6732 | -0.2283 | -0.2756 | 0.6057 | 0.0472 | -90.7353 | -81.5436 | -2.8576 | -2.8635 |
|
70 |
+
| 0.6437 | 0.5513 | 3200 | 0.6646 | -0.3557 | -0.4265 | 0.6120 | 0.0708 | -105.8322 | -94.2796 | -2.7546 | -2.7607 |
|
71 |
+
| 0.6516 | 0.6203 | 3600 | 0.6602 | -0.4125 | -0.4982 | 0.6178 | 0.0856 | -112.9954 | -99.9643 | -2.6547 | -2.6612 |
|
72 |
+
| 0.6264 | 0.6892 | 4000 | 0.6514 | -0.5858 | -0.7050 | 0.6315 | 0.1192 | -133.6785 | -117.2944 | -2.5252 | -2.5324 |
|
73 |
+
| 0.6109 | 0.7581 | 4400 | 0.6474 | -0.6217 | -0.7587 | 0.6313 | 0.1370 | -139.0484 | -120.8850 | -2.4041 | -2.4124 |
|
74 |
+
| 0.6153 | 0.8270 | 4800 | 0.6432 | -0.7112 | -0.8720 | 0.6266 | 0.1608 | -150.3814 | -129.8305 | -2.3206 | -2.3302 |
|
75 |
+
| 0.6107 | 0.8959 | 5200 | 0.6407 | -0.7470 | -0.9249 | 0.6350 | 0.1779 | -155.6741 | -133.4166 | -2.2363 | -2.2476 |
|
76 |
+
| 0.6061 | 0.9649 | 5600 | 0.6392 | -0.7851 | -0.9723 | 0.6315 | 0.1871 | -160.4070 | -137.2255 | -2.1733 | -2.1859 |
|
77 |
+
| 0.5701 | 1.0338 | 6000 | 0.6356 | -1.0035 | -1.2450 | 0.6292 | 0.2415 | -187.6758 | -159.0581 | -2.0122 | -2.0292 |
|
78 |
+
| 0.5557 | 1.1027 | 6400 | 0.6358 | -1.0296 | -1.2785 | 0.6322 | 0.2489 | -191.0262 | -161.6682 | -1.9777 | -1.9953 |
|
79 |
+
| 0.5292 | 1.1716 | 6800 | 0.6333 | -1.0878 | -1.3492 | 0.6313 | 0.2614 | -198.1001 | -167.4900 | -1.8969 | -1.9159 |
|
80 |
+
| 0.5473 | 1.2405 | 7200 | 0.6354 | -1.0479 | -1.2958 | 0.6262 | 0.2479 | -192.7597 | -163.5001 | -1.9044 | -1.9226 |
|
81 |
+
| 0.6231 | 1.3094 | 7600 | 0.6346 | -1.2184 | -1.4979 | 0.6289 | 0.2795 | -212.9705 | -180.5535 | -1.8355 | -1.8558 |
|
82 |
+
| 0.5403 | 1.3784 | 8000 | 0.6339 | -1.1437 | -1.4111 | 0.6264 | 0.2673 | -204.2867 | -173.0842 | -1.8647 | -1.8848 |
|
83 |
+
| 0.5444 | 1.4473 | 8400 | 0.6339 | -1.0726 | -1.3310 | 0.6287 | 0.2584 | -196.2827 | -165.9765 | -1.8568 | -1.8768 |
|
84 |
+
| 0.5766 | 1.5162 | 8800 | 0.6329 | -1.0364 | -1.2879 | 0.6336 | 0.2516 | -191.9749 | -162.3483 | -1.8819 | -1.9009 |
|
85 |
+
| 0.525 | 1.5851 | 9200 | 0.6320 | -1.1870 | -1.4611 | 0.6366 | 0.2740 | -209.2869 | -177.4161 | -1.8122 | -1.8325 |
|
86 |
+
| 0.5174 | 1.6540 | 9600 | 0.6310 | -1.2662 | -1.5606 | 0.6375 | 0.2944 | -219.2438 | -185.3348 | -1.7597 | -1.7810 |
|
87 |
+
| 0.5312 | 1.7229 | 10000 | 0.6313 | -1.2979 | -1.6013 | 0.6359 | 0.3033 | -223.3081 | -188.5056 | -1.7629 | -1.7848 |
|
88 |
+
| 0.4923 | 1.7919 | 10400 | 0.6312 | -1.1596 | -1.4412 | 0.6334 | 0.2815 | -207.2955 | -174.6746 | -1.7754 | -1.7966 |
|
89 |
+
| 0.5386 | 1.8608 | 10800 | 0.6304 | -1.2706 | -1.5735 | 0.6373 | 0.3029 | -220.5279 | -185.7685 | -1.7500 | -1.7722 |
|
90 |
+
| 0.5178 | 1.9297 | 11200 | 0.6295 | -1.2859 | -1.6008 | 0.6443 | 0.3149 | -223.2599 | -187.3036 | -1.7272 | -1.7501 |
|
91 |
+
| 0.5556 | 1.9986 | 11600 | 0.6295 | -1.2652 | -1.5714 | 0.6362 | 0.3062 | -220.3214 | -185.2294 | -1.7356 | -1.7580 |
|
92 |
+
| 0.4901 | 2.0675 | 12000 | 0.6303 | -1.4749 | -1.8246 | 0.6447 | 0.3497 | -245.6420 | -206.2009 | -1.6688 | -1.6928 |
|
93 |
+
| 0.4713 | 2.1365 | 12400 | 0.6303 | -1.6230 | -2.0017 | 0.6471 | 0.3786 | -263.3478 | -221.0147 | -1.6397 | -1.6644 |
|
94 |
+
| 0.5188 | 2.2054 | 12800 | 0.6305 | -1.4593 | -1.8052 | 0.6408 | 0.3458 | -243.6979 | -204.6454 | -1.6776 | -1.7011 |
|
95 |
+
| 0.5395 | 2.2743 | 13200 | 0.6315 | -1.5373 | -1.9051 | 0.6429 | 0.3678 | -253.6892 | -212.4377 | -1.6591 | -1.6834 |
|
96 |
+
| 0.5059 | 2.3432 | 13600 | 0.6318 | -1.4799 | -1.8381 | 0.6431 | 0.3582 | -246.9884 | -206.6992 | -1.6812 | -1.7051 |
|
97 |
+
| 0.4543 | 2.4121 | 14000 | 0.6318 | -1.3717 | -1.7109 | 0.6459 | 0.3392 | -234.2693 | -195.8793 | -1.7134 | -1.7366 |
|
98 |
+
| 0.5121 | 2.4810 | 14400 | 0.6308 | -1.4206 | -1.7736 | 0.6447 | 0.3530 | -240.5389 | -200.7700 | -1.7016 | -1.7252 |
|
99 |
+
| 0.4847 | 2.5500 | 14800 | 0.6304 | -1.4817 | -1.8498 | 0.6443 | 0.3681 | -248.1589 | -206.8796 | -1.6912 | -1.7153 |
|
100 |
+
| 0.4701 | 2.6189 | 15200 | 0.6306 | -1.4145 | -1.7659 | 0.6445 | 0.3514 | -239.7732 | -200.1665 | -1.7090 | -1.7324 |
|
101 |
+
| 0.5011 | 2.6878 | 15600 | 0.6304 | -1.4080 | -1.7575 | 0.6434 | 0.3495 | -238.9349 | -199.5119 | -1.7135 | -1.7369 |
|
102 |
+
| 0.4936 | 2.7567 | 16000 | 0.6304 | -1.4490 | -1.8088 | 0.6436 | 0.3598 | -244.0595 | -203.6143 | -1.7010 | -1.7248 |
|
103 |
+
| 0.4952 | 2.8256 | 16400 | 0.6312 | -1.4483 | -1.8060 | 0.6438 | 0.3577 | -243.7794 | -203.5389 | -1.7043 | -1.7279 |
|
104 |
+
| 0.5024 | 2.8946 | 16800 | 0.6304 | -1.4492 | -1.8094 | 0.6429 | 0.3602 | -244.1201 | -203.6308 | -1.7037 | -1.7274 |
|
105 |
+
| 0.5054 | 2.9635 | 17200 | 0.6303 | -1.4484 | -1.8080 | 0.6436 | 0.3596 | -243.9776 | -203.5508 | -1.7024 | -1.7262 |
|
106 |
+
|
107 |
+
|
108 |
+
### Framework versions
|
109 |
+
|
110 |
+
- Transformers 4.41.2
|
111 |
+
- Pytorch 2.1.2
|
112 |
+
- Datasets 2.19.2
|
113 |
+
- Tokenizers 0.19.1
|
all_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 3.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.5724969553024556,
|
5 |
+
"train_runtime": 86264.9537,
|
6 |
+
"train_samples": 92858,
|
7 |
+
"train_samples_per_second": 3.229,
|
8 |
+
"train_steps_per_second": 0.202
|
9 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token_id": 1,
|
3 |
+
"eos_token_id": 2,
|
4 |
+
"max_length": 2048,
|
5 |
+
"pad_token_id": 0,
|
6 |
+
"transformers_version": "4.41.2"
|
7 |
+
}
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4400216536
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0cf069d97b932ed27f02f1b30381e022374783ad50c007eb6cece200aa0d186f
|
3 |
size 4400216536
|
runs/Jun11_00-56-54_poseidon/events.out.tfevents.1718067780.poseidon.4172683.0
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f61eb4857897594467d8b0c3d3529ee5f9bbe8e3ea175d27abcfe2c62b418de1
|
3 |
+
size 1236955
|
train_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 3.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.5724969553024556,
|
5 |
+
"train_runtime": 86264.9537,
|
6 |
+
"train_samples": 92858,
|
7 |
+
"train_samples_per_second": 3.229,
|
8 |
+
"train_steps_per_second": 0.202
|
9 |
+
}
|
trainer_state.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|