GaetanMichelet commited on
Commit
02a742a
1 Parent(s): b1b44f7

Model save

Browse files
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
3
+ library_name: peft
4
+ license: llama3.1
5
+ tags:
6
+ - trl
7
+ - sft
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: Llama-31-8B_task-1_120-samples_config-4_full
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # Llama-31-8B_task-1_120-samples_config-4_full
18
+
19
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.9120
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 1e-05
41
+ - train_batch_size: 1
42
+ - eval_batch_size: 1
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - gradient_accumulation_steps: 16
46
+ - total_train_batch_size: 16
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: cosine
49
+ - lr_scheduler_warmup_ratio: 0.1
50
+ - num_epochs: 150
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss |
55
+ |:-------------:|:-------:|:----:|:---------------:|
56
+ | 2.4687 | 0.9091 | 5 | 2.4589 |
57
+ | 2.5083 | 2.0 | 11 | 2.4440 |
58
+ | 2.4676 | 2.9091 | 16 | 2.4218 |
59
+ | 2.4562 | 4.0 | 22 | 2.3870 |
60
+ | 2.377 | 4.9091 | 27 | 2.3475 |
61
+ | 2.3303 | 6.0 | 33 | 2.2793 |
62
+ | 2.2553 | 6.9091 | 38 | 2.2254 |
63
+ | 2.174 | 8.0 | 44 | 2.1392 |
64
+ | 2.131 | 8.9091 | 49 | 2.0661 |
65
+ | 2.0142 | 10.0 | 55 | 1.9626 |
66
+ | 1.8873 | 10.9091 | 60 | 1.8746 |
67
+ | 1.7633 | 12.0 | 66 | 1.7650 |
68
+ | 1.726 | 12.9091 | 71 | 1.6563 |
69
+ | 1.5711 | 14.0 | 77 | 1.5123 |
70
+ | 1.4344 | 14.9091 | 82 | 1.3950 |
71
+ | 1.3201 | 16.0 | 88 | 1.2661 |
72
+ | 1.1787 | 16.9091 | 93 | 1.1831 |
73
+ | 1.1444 | 18.0 | 99 | 1.1188 |
74
+ | 1.0591 | 18.9091 | 104 | 1.0836 |
75
+ | 1.0151 | 20.0 | 110 | 1.0540 |
76
+ | 1.0277 | 20.9091 | 115 | 1.0388 |
77
+ | 1.0025 | 22.0 | 121 | 1.0250 |
78
+ | 1.0161 | 22.9091 | 126 | 1.0154 |
79
+ | 0.9946 | 24.0 | 132 | 1.0047 |
80
+ | 0.9773 | 24.9091 | 137 | 0.9970 |
81
+ | 0.9708 | 26.0 | 143 | 0.9890 |
82
+ | 0.9374 | 26.9091 | 148 | 0.9822 |
83
+ | 0.9403 | 28.0 | 154 | 0.9751 |
84
+ | 0.94 | 28.9091 | 159 | 0.9703 |
85
+ | 0.902 | 30.0 | 165 | 0.9633 |
86
+ | 0.9215 | 30.9091 | 170 | 0.9604 |
87
+ | 0.8854 | 32.0 | 176 | 0.9548 |
88
+ | 0.96 | 32.9091 | 181 | 0.9503 |
89
+ | 0.9162 | 34.0 | 187 | 0.9453 |
90
+ | 0.8686 | 34.9091 | 192 | 0.9429 |
91
+ | 0.906 | 36.0 | 198 | 0.9385 |
92
+ | 0.8762 | 36.9091 | 203 | 0.9354 |
93
+ | 0.8929 | 38.0 | 209 | 0.9332 |
94
+ | 0.8687 | 38.9091 | 214 | 0.9301 |
95
+ | 0.8933 | 40.0 | 220 | 0.9279 |
96
+ | 0.858 | 40.9091 | 225 | 0.9241 |
97
+ | 0.8481 | 42.0 | 231 | 0.9223 |
98
+ | 0.8228 | 42.9091 | 236 | 0.9217 |
99
+ | 0.8593 | 44.0 | 242 | 0.9186 |
100
+ | 0.8238 | 44.9091 | 247 | 0.9156 |
101
+ | 0.8081 | 46.0 | 253 | 0.9161 |
102
+ | 0.8327 | 46.9091 | 258 | 0.9129 |
103
+ | 0.8029 | 48.0 | 264 | 0.9110 |
104
+ | 0.7909 | 48.9091 | 269 | 0.9094 |
105
+ | 0.7826 | 50.0 | 275 | 0.9079 |
106
+ | 0.773 | 50.9091 | 280 | 0.9122 |
107
+ | 0.7377 | 52.0 | 286 | 0.9078 |
108
+ | 0.7491 | 52.9091 | 291 | 0.9050 |
109
+ | 0.7414 | 54.0 | 297 | 0.9093 |
110
+ | 0.7275 | 54.9091 | 302 | 0.9053 |
111
+ | 0.7198 | 56.0 | 308 | 0.9046 |
112
+ | 0.7203 | 56.9091 | 313 | 0.9093 |
113
+ | 0.6903 | 58.0 | 319 | 0.9042 |
114
+ | 0.6987 | 58.9091 | 324 | 0.9107 |
115
+ | 0.7141 | 60.0 | 330 | 0.9079 |
116
+ | 0.7023 | 60.9091 | 335 | 0.9120 |
117
+ | 0.6945 | 62.0 | 341 | 0.9087 |
118
+ | 0.6897 | 62.9091 | 346 | 0.9130 |
119
+ | 0.6597 | 64.0 | 352 | 0.9134 |
120
+ | 0.6954 | 64.9091 | 357 | 0.9120 |
121
+
122
+
123
+ ### Framework versions
124
+
125
+ - PEFT 0.12.0
126
+ - Transformers 4.44.0
127
+ - Pytorch 2.1.2+cu121
128
+ - Datasets 2.20.0
129
+ - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d9d645d26bdcfe4fa7ac7a98e1cd6b6c2ffb6ff73c526ff832d03e05465b6dfc
3
  size 167832240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18024a4767336de1643bc9edfa5aad95937302833f3d2ca2ce9e421fc6d34208
3
  size 167832240
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 64.9090909090909,
3
+ "total_flos": 8.780093794235187e+16,
4
+ "train_loss": 1.1563805774146436,
5
+ "train_runtime": 6696.4639,
6
+ "train_samples": 88,
7
+ "train_samples_per_second": 1.971,
8
+ "train_steps_per_second": 0.112
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 64.9090909090909,
3
+ "total_flos": 8.780093794235187e+16,
4
+ "train_loss": 1.1563805774146436,
5
+ "train_runtime": 6696.4639,
6
+ "train_samples": 88,
7
+ "train_samples_per_second": 1.971,
8
+ "train_steps_per_second": 0.112
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1824 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9042022824287415,
3
+ "best_model_checkpoint": "data/Llama-31-8B_task-1_120-samples_config-4_full/checkpoint-319",
4
+ "epoch": 64.9090909090909,
5
+ "eval_steps": 500,
6
+ "global_step": 357,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.18181818181818182,
13
+ "grad_norm": 1.9413753747940063,
14
+ "learning_rate": 1.3333333333333336e-07,
15
+ "loss": 2.4963,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.36363636363636365,
20
+ "grad_norm": 1.9993785619735718,
21
+ "learning_rate": 2.666666666666667e-07,
22
+ "loss": 2.515,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.7272727272727273,
27
+ "grad_norm": 1.8845915794372559,
28
+ "learning_rate": 5.333333333333335e-07,
29
+ "loss": 2.4687,
30
+ "step": 4
31
+ },
32
+ {
33
+ "epoch": 0.9090909090909091,
34
+ "eval_loss": 2.458869218826294,
35
+ "eval_runtime": 9.6295,
36
+ "eval_samples_per_second": 2.492,
37
+ "eval_steps_per_second": 2.492,
38
+ "step": 5
39
+ },
40
+ {
41
+ "epoch": 1.0909090909090908,
42
+ "grad_norm": 2.0253942012786865,
43
+ "learning_rate": 8.000000000000001e-07,
44
+ "loss": 2.4759,
45
+ "step": 6
46
+ },
47
+ {
48
+ "epoch": 1.4545454545454546,
49
+ "grad_norm": 1.8723983764648438,
50
+ "learning_rate": 1.066666666666667e-06,
51
+ "loss": 2.4786,
52
+ "step": 8
53
+ },
54
+ {
55
+ "epoch": 1.8181818181818183,
56
+ "grad_norm": 1.924127459526062,
57
+ "learning_rate": 1.3333333333333334e-06,
58
+ "loss": 2.5083,
59
+ "step": 10
60
+ },
61
+ {
62
+ "epoch": 2.0,
63
+ "eval_loss": 2.443969488143921,
64
+ "eval_runtime": 9.6222,
65
+ "eval_samples_per_second": 2.494,
66
+ "eval_steps_per_second": 2.494,
67
+ "step": 11
68
+ },
69
+ {
70
+ "epoch": 2.1818181818181817,
71
+ "grad_norm": 1.7455520629882812,
72
+ "learning_rate": 1.6000000000000001e-06,
73
+ "loss": 2.4117,
74
+ "step": 12
75
+ },
76
+ {
77
+ "epoch": 2.5454545454545454,
78
+ "grad_norm": 1.710787296295166,
79
+ "learning_rate": 1.8666666666666669e-06,
80
+ "loss": 2.4583,
81
+ "step": 14
82
+ },
83
+ {
84
+ "epoch": 2.909090909090909,
85
+ "grad_norm": 1.6907106637954712,
86
+ "learning_rate": 2.133333333333334e-06,
87
+ "loss": 2.4676,
88
+ "step": 16
89
+ },
90
+ {
91
+ "epoch": 2.909090909090909,
92
+ "eval_loss": 2.421785593032837,
93
+ "eval_runtime": 9.6333,
94
+ "eval_samples_per_second": 2.491,
95
+ "eval_steps_per_second": 2.491,
96
+ "step": 16
97
+ },
98
+ {
99
+ "epoch": 3.2727272727272725,
100
+ "grad_norm": 1.5824416875839233,
101
+ "learning_rate": 2.4000000000000003e-06,
102
+ "loss": 2.4237,
103
+ "step": 18
104
+ },
105
+ {
106
+ "epoch": 3.6363636363636362,
107
+ "grad_norm": 1.59761643409729,
108
+ "learning_rate": 2.666666666666667e-06,
109
+ "loss": 2.4148,
110
+ "step": 20
111
+ },
112
+ {
113
+ "epoch": 4.0,
114
+ "grad_norm": 1.6097276210784912,
115
+ "learning_rate": 2.9333333333333338e-06,
116
+ "loss": 2.4562,
117
+ "step": 22
118
+ },
119
+ {
120
+ "epoch": 4.0,
121
+ "eval_loss": 2.3870151042938232,
122
+ "eval_runtime": 9.6315,
123
+ "eval_samples_per_second": 2.492,
124
+ "eval_steps_per_second": 2.492,
125
+ "step": 22
126
+ },
127
+ {
128
+ "epoch": 4.363636363636363,
129
+ "grad_norm": 1.636257529258728,
130
+ "learning_rate": 3.2000000000000003e-06,
131
+ "loss": 2.4287,
132
+ "step": 24
133
+ },
134
+ {
135
+ "epoch": 4.7272727272727275,
136
+ "grad_norm": 1.5569593906402588,
137
+ "learning_rate": 3.4666666666666672e-06,
138
+ "loss": 2.377,
139
+ "step": 26
140
+ },
141
+ {
142
+ "epoch": 4.909090909090909,
143
+ "eval_loss": 2.3474807739257812,
144
+ "eval_runtime": 9.6237,
145
+ "eval_samples_per_second": 2.494,
146
+ "eval_steps_per_second": 2.494,
147
+ "step": 27
148
+ },
149
+ {
150
+ "epoch": 5.090909090909091,
151
+ "grad_norm": 1.808140516281128,
152
+ "learning_rate": 3.7333333333333337e-06,
153
+ "loss": 2.3605,
154
+ "step": 28
155
+ },
156
+ {
157
+ "epoch": 5.454545454545454,
158
+ "grad_norm": 1.7729766368865967,
159
+ "learning_rate": 4.000000000000001e-06,
160
+ "loss": 2.3404,
161
+ "step": 30
162
+ },
163
+ {
164
+ "epoch": 5.818181818181818,
165
+ "grad_norm": 1.9055501222610474,
166
+ "learning_rate": 4.266666666666668e-06,
167
+ "loss": 2.3303,
168
+ "step": 32
169
+ },
170
+ {
171
+ "epoch": 6.0,
172
+ "eval_loss": 2.2793145179748535,
173
+ "eval_runtime": 9.6178,
174
+ "eval_samples_per_second": 2.495,
175
+ "eval_steps_per_second": 2.495,
176
+ "step": 33
177
+ },
178
+ {
179
+ "epoch": 6.181818181818182,
180
+ "grad_norm": 1.628233790397644,
181
+ "learning_rate": 4.533333333333334e-06,
182
+ "loss": 2.2739,
183
+ "step": 34
184
+ },
185
+ {
186
+ "epoch": 6.545454545454545,
187
+ "grad_norm": 1.4241219758987427,
188
+ "learning_rate": 4.800000000000001e-06,
189
+ "loss": 2.3294,
190
+ "step": 36
191
+ },
192
+ {
193
+ "epoch": 6.909090909090909,
194
+ "grad_norm": 1.5400785207748413,
195
+ "learning_rate": 5.0666666666666676e-06,
196
+ "loss": 2.2553,
197
+ "step": 38
198
+ },
199
+ {
200
+ "epoch": 6.909090909090909,
201
+ "eval_loss": 2.2254207134246826,
202
+ "eval_runtime": 9.6211,
203
+ "eval_samples_per_second": 2.495,
204
+ "eval_steps_per_second": 2.495,
205
+ "step": 38
206
+ },
207
+ {
208
+ "epoch": 7.2727272727272725,
209
+ "grad_norm": 1.4445594549179077,
210
+ "learning_rate": 5.333333333333334e-06,
211
+ "loss": 2.2498,
212
+ "step": 40
213
+ },
214
+ {
215
+ "epoch": 7.636363636363637,
216
+ "grad_norm": 1.7270216941833496,
217
+ "learning_rate": 5.600000000000001e-06,
218
+ "loss": 2.1853,
219
+ "step": 42
220
+ },
221
+ {
222
+ "epoch": 8.0,
223
+ "grad_norm": 2.017972707748413,
224
+ "learning_rate": 5.8666666666666675e-06,
225
+ "loss": 2.174,
226
+ "step": 44
227
+ },
228
+ {
229
+ "epoch": 8.0,
230
+ "eval_loss": 2.139225721359253,
231
+ "eval_runtime": 9.6197,
232
+ "eval_samples_per_second": 2.495,
233
+ "eval_steps_per_second": 2.495,
234
+ "step": 44
235
+ },
236
+ {
237
+ "epoch": 8.363636363636363,
238
+ "grad_norm": 1.3301095962524414,
239
+ "learning_rate": 6.133333333333334e-06,
240
+ "loss": 2.1429,
241
+ "step": 46
242
+ },
243
+ {
244
+ "epoch": 8.727272727272727,
245
+ "grad_norm": 1.083661675453186,
246
+ "learning_rate": 6.4000000000000006e-06,
247
+ "loss": 2.131,
248
+ "step": 48
249
+ },
250
+ {
251
+ "epoch": 8.909090909090908,
252
+ "eval_loss": 2.0661048889160156,
253
+ "eval_runtime": 9.6328,
254
+ "eval_samples_per_second": 2.491,
255
+ "eval_steps_per_second": 2.491,
256
+ "step": 49
257
+ },
258
+ {
259
+ "epoch": 9.090909090909092,
260
+ "grad_norm": 1.0499473810195923,
261
+ "learning_rate": 6.666666666666667e-06,
262
+ "loss": 2.0813,
263
+ "step": 50
264
+ },
265
+ {
266
+ "epoch": 9.454545454545455,
267
+ "grad_norm": 1.014916181564331,
268
+ "learning_rate": 6.9333333333333344e-06,
269
+ "loss": 2.0497,
270
+ "step": 52
271
+ },
272
+ {
273
+ "epoch": 9.818181818181818,
274
+ "grad_norm": 1.0278065204620361,
275
+ "learning_rate": 7.2000000000000005e-06,
276
+ "loss": 2.0142,
277
+ "step": 54
278
+ },
279
+ {
280
+ "epoch": 10.0,
281
+ "eval_loss": 1.9625515937805176,
282
+ "eval_runtime": 9.6185,
283
+ "eval_samples_per_second": 2.495,
284
+ "eval_steps_per_second": 2.495,
285
+ "step": 55
286
+ },
287
+ {
288
+ "epoch": 10.181818181818182,
289
+ "grad_norm": 0.9720411896705627,
290
+ "learning_rate": 7.4666666666666675e-06,
291
+ "loss": 1.9759,
292
+ "step": 56
293
+ },
294
+ {
295
+ "epoch": 10.545454545454545,
296
+ "grad_norm": 0.9346638321876526,
297
+ "learning_rate": 7.733333333333334e-06,
298
+ "loss": 1.941,
299
+ "step": 58
300
+ },
301
+ {
302
+ "epoch": 10.909090909090908,
303
+ "grad_norm": 0.8559221029281616,
304
+ "learning_rate": 8.000000000000001e-06,
305
+ "loss": 1.8873,
306
+ "step": 60
307
+ },
308
+ {
309
+ "epoch": 10.909090909090908,
310
+ "eval_loss": 1.8745914697647095,
311
+ "eval_runtime": 9.6239,
312
+ "eval_samples_per_second": 2.494,
313
+ "eval_steps_per_second": 2.494,
314
+ "step": 60
315
+ },
316
+ {
317
+ "epoch": 11.272727272727273,
318
+ "grad_norm": 0.8817884922027588,
319
+ "learning_rate": 8.266666666666667e-06,
320
+ "loss": 1.9132,
321
+ "step": 62
322
+ },
323
+ {
324
+ "epoch": 11.636363636363637,
325
+ "grad_norm": 0.8232048749923706,
326
+ "learning_rate": 8.533333333333335e-06,
327
+ "loss": 1.8387,
328
+ "step": 64
329
+ },
330
+ {
331
+ "epoch": 12.0,
332
+ "grad_norm": 0.8017051815986633,
333
+ "learning_rate": 8.8e-06,
334
+ "loss": 1.7633,
335
+ "step": 66
336
+ },
337
+ {
338
+ "epoch": 12.0,
339
+ "eval_loss": 1.7650254964828491,
340
+ "eval_runtime": 9.6208,
341
+ "eval_samples_per_second": 2.495,
342
+ "eval_steps_per_second": 2.495,
343
+ "step": 66
344
+ },
345
+ {
346
+ "epoch": 12.363636363636363,
347
+ "grad_norm": 0.9119341373443604,
348
+ "learning_rate": 9.066666666666667e-06,
349
+ "loss": 1.72,
350
+ "step": 68
351
+ },
352
+ {
353
+ "epoch": 12.727272727272727,
354
+ "grad_norm": 0.8771039843559265,
355
+ "learning_rate": 9.333333333333334e-06,
356
+ "loss": 1.726,
357
+ "step": 70
358
+ },
359
+ {
360
+ "epoch": 12.909090909090908,
361
+ "eval_loss": 1.6562695503234863,
362
+ "eval_runtime": 9.6212,
363
+ "eval_samples_per_second": 2.494,
364
+ "eval_steps_per_second": 2.494,
365
+ "step": 71
366
+ },
367
+ {
368
+ "epoch": 13.090909090909092,
369
+ "grad_norm": 0.9313778877258301,
370
+ "learning_rate": 9.600000000000001e-06,
371
+ "loss": 1.6816,
372
+ "step": 72
373
+ },
374
+ {
375
+ "epoch": 13.454545454545455,
376
+ "grad_norm": 1.1438463926315308,
377
+ "learning_rate": 9.866666666666668e-06,
378
+ "loss": 1.6168,
379
+ "step": 74
380
+ },
381
+ {
382
+ "epoch": 13.818181818181818,
383
+ "grad_norm": 1.0701647996902466,
384
+ "learning_rate": 9.999945845889795e-06,
385
+ "loss": 1.5711,
386
+ "step": 76
387
+ },
388
+ {
389
+ "epoch": 14.0,
390
+ "eval_loss": 1.5122851133346558,
391
+ "eval_runtime": 9.627,
392
+ "eval_samples_per_second": 2.493,
393
+ "eval_steps_per_second": 2.493,
394
+ "step": 77
395
+ },
396
+ {
397
+ "epoch": 14.181818181818182,
398
+ "grad_norm": 0.9771044254302979,
399
+ "learning_rate": 9.999512620046523e-06,
400
+ "loss": 1.5488,
401
+ "step": 78
402
+ },
403
+ {
404
+ "epoch": 14.545454545454545,
405
+ "grad_norm": 0.91764235496521,
406
+ "learning_rate": 9.99864620589731e-06,
407
+ "loss": 1.4504,
408
+ "step": 80
409
+ },
410
+ {
411
+ "epoch": 14.909090909090908,
412
+ "grad_norm": 0.9226170182228088,
413
+ "learning_rate": 9.99734667851357e-06,
414
+ "loss": 1.4344,
415
+ "step": 82
416
+ },
417
+ {
418
+ "epoch": 14.909090909090908,
419
+ "eval_loss": 1.3950275182724,
420
+ "eval_runtime": 9.6306,
421
+ "eval_samples_per_second": 2.492,
422
+ "eval_steps_per_second": 2.492,
423
+ "step": 82
424
+ },
425
+ {
426
+ "epoch": 15.272727272727273,
427
+ "grad_norm": 0.8576654195785522,
428
+ "learning_rate": 9.995614150494293e-06,
429
+ "loss": 1.3568,
430
+ "step": 84
431
+ },
432
+ {
433
+ "epoch": 15.636363636363637,
434
+ "grad_norm": 0.9888388514518738,
435
+ "learning_rate": 9.993448771956285e-06,
436
+ "loss": 1.3195,
437
+ "step": 86
438
+ },
439
+ {
440
+ "epoch": 16.0,
441
+ "grad_norm": 0.9264158010482788,
442
+ "learning_rate": 9.99085073052117e-06,
443
+ "loss": 1.3201,
444
+ "step": 88
445
+ },
446
+ {
447
+ "epoch": 16.0,
448
+ "eval_loss": 1.2661280632019043,
449
+ "eval_runtime": 9.6187,
450
+ "eval_samples_per_second": 2.495,
451
+ "eval_steps_per_second": 2.495,
452
+ "step": 88
453
+ },
454
+ {
455
+ "epoch": 16.363636363636363,
456
+ "grad_norm": 0.8193872570991516,
457
+ "learning_rate": 9.987820251299121e-06,
458
+ "loss": 1.2496,
459
+ "step": 90
460
+ },
461
+ {
462
+ "epoch": 16.727272727272727,
463
+ "grad_norm": 0.7646848559379578,
464
+ "learning_rate": 9.984357596869369e-06,
465
+ "loss": 1.1787,
466
+ "step": 92
467
+ },
468
+ {
469
+ "epoch": 16.90909090909091,
470
+ "eval_loss": 1.18313729763031,
471
+ "eval_runtime": 9.6587,
472
+ "eval_samples_per_second": 2.485,
473
+ "eval_steps_per_second": 2.485,
474
+ "step": 93
475
+ },
476
+ {
477
+ "epoch": 17.09090909090909,
478
+ "grad_norm": 0.7123040556907654,
479
+ "learning_rate": 9.980463067257437e-06,
480
+ "loss": 1.2232,
481
+ "step": 94
482
+ },
483
+ {
484
+ "epoch": 17.454545454545453,
485
+ "grad_norm": 0.6257199645042419,
486
+ "learning_rate": 9.976136999909156e-06,
487
+ "loss": 1.1068,
488
+ "step": 96
489
+ },
490
+ {
491
+ "epoch": 17.818181818181817,
492
+ "grad_norm": 0.7334635257720947,
493
+ "learning_rate": 9.971379769661422e-06,
494
+ "loss": 1.1444,
495
+ "step": 98
496
+ },
497
+ {
498
+ "epoch": 18.0,
499
+ "eval_loss": 1.1187793016433716,
500
+ "eval_runtime": 9.6189,
501
+ "eval_samples_per_second": 2.495,
502
+ "eval_steps_per_second": 2.495,
503
+ "step": 99
504
+ },
505
+ {
506
+ "epoch": 18.181818181818183,
507
+ "grad_norm": 0.589821457862854,
508
+ "learning_rate": 9.966191788709716e-06,
509
+ "loss": 1.1334,
510
+ "step": 100
511
+ },
512
+ {
513
+ "epoch": 18.545454545454547,
514
+ "grad_norm": 0.5560262799263,
515
+ "learning_rate": 9.960573506572391e-06,
516
+ "loss": 1.0929,
517
+ "step": 102
518
+ },
519
+ {
520
+ "epoch": 18.90909090909091,
521
+ "grad_norm": 0.5528337359428406,
522
+ "learning_rate": 9.95452541005172e-06,
523
+ "loss": 1.0591,
524
+ "step": 104
525
+ },
526
+ {
527
+ "epoch": 18.90909090909091,
528
+ "eval_loss": 1.0836217403411865,
529
+ "eval_runtime": 9.632,
530
+ "eval_samples_per_second": 2.492,
531
+ "eval_steps_per_second": 2.492,
532
+ "step": 104
533
+ },
534
+ {
535
+ "epoch": 19.272727272727273,
536
+ "grad_norm": 0.5791244506835938,
537
+ "learning_rate": 9.948048023191728e-06,
538
+ "loss": 1.0522,
539
+ "step": 106
540
+ },
541
+ {
542
+ "epoch": 19.636363636363637,
543
+ "grad_norm": 0.5012540817260742,
544
+ "learning_rate": 9.941141907232766e-06,
545
+ "loss": 1.0986,
546
+ "step": 108
547
+ },
548
+ {
549
+ "epoch": 20.0,
550
+ "grad_norm": 0.4489583969116211,
551
+ "learning_rate": 9.933807660562898e-06,
552
+ "loss": 1.0151,
553
+ "step": 110
554
+ },
555
+ {
556
+ "epoch": 20.0,
557
+ "eval_loss": 1.0539679527282715,
558
+ "eval_runtime": 9.6338,
559
+ "eval_samples_per_second": 2.491,
560
+ "eval_steps_per_second": 2.491,
561
+ "step": 110
562
+ },
563
+ {
564
+ "epoch": 20.363636363636363,
565
+ "grad_norm": 0.384694367647171,
566
+ "learning_rate": 9.926045918666045e-06,
567
+ "loss": 1.0363,
568
+ "step": 112
569
+ },
570
+ {
571
+ "epoch": 20.727272727272727,
572
+ "grad_norm": 0.3782545030117035,
573
+ "learning_rate": 9.91785735406693e-06,
574
+ "loss": 1.0277,
575
+ "step": 114
576
+ },
577
+ {
578
+ "epoch": 20.90909090909091,
579
+ "eval_loss": 1.0388442277908325,
580
+ "eval_runtime": 9.6309,
581
+ "eval_samples_per_second": 2.492,
582
+ "eval_steps_per_second": 2.492,
583
+ "step": 115
584
+ },
585
+ {
586
+ "epoch": 21.09090909090909,
587
+ "grad_norm": 0.3923183083534241,
588
+ "learning_rate": 9.909242676272797e-06,
589
+ "loss": 1.0385,
590
+ "step": 116
591
+ },
592
+ {
593
+ "epoch": 21.454545454545453,
594
+ "grad_norm": 0.3774013817310333,
595
+ "learning_rate": 9.90020263171194e-06,
596
+ "loss": 1.0073,
597
+ "step": 118
598
+ },
599
+ {
600
+ "epoch": 21.818181818181817,
601
+ "grad_norm": 0.3472942113876343,
602
+ "learning_rate": 9.890738003669029e-06,
603
+ "loss": 1.0025,
604
+ "step": 120
605
+ },
606
+ {
607
+ "epoch": 22.0,
608
+ "eval_loss": 1.0249600410461426,
609
+ "eval_runtime": 9.6161,
610
+ "eval_samples_per_second": 2.496,
611
+ "eval_steps_per_second": 2.496,
612
+ "step": 121
613
+ },
614
+ {
615
+ "epoch": 22.181818181818183,
616
+ "grad_norm": 0.3280906081199646,
617
+ "learning_rate": 9.880849612217238e-06,
618
+ "loss": 0.9887,
619
+ "step": 122
620
+ },
621
+ {
622
+ "epoch": 22.545454545454547,
623
+ "grad_norm": 0.3648756742477417,
624
+ "learning_rate": 9.870538314147194e-06,
625
+ "loss": 1.0023,
626
+ "step": 124
627
+ },
628
+ {
629
+ "epoch": 22.90909090909091,
630
+ "grad_norm": 0.390601247549057,
631
+ "learning_rate": 9.859805002892733e-06,
632
+ "loss": 1.0161,
633
+ "step": 126
634
+ },
635
+ {
636
+ "epoch": 22.90909090909091,
637
+ "eval_loss": 1.0154192447662354,
638
+ "eval_runtime": 9.6244,
639
+ "eval_samples_per_second": 2.494,
640
+ "eval_steps_per_second": 2.494,
641
+ "step": 126
642
+ },
643
+ {
644
+ "epoch": 23.272727272727273,
645
+ "grad_norm": 0.3790784180164337,
646
+ "learning_rate": 9.84865060845349e-06,
647
+ "loss": 0.9941,
648
+ "step": 128
649
+ },
650
+ {
651
+ "epoch": 23.636363636363637,
652
+ "grad_norm": 0.36824390292167664,
653
+ "learning_rate": 9.83707609731432e-06,
654
+ "loss": 0.9697,
655
+ "step": 130
656
+ },
657
+ {
658
+ "epoch": 24.0,
659
+ "grad_norm": 0.39850422739982605,
660
+ "learning_rate": 9.825082472361558e-06,
661
+ "loss": 0.9946,
662
+ "step": 132
663
+ },
664
+ {
665
+ "epoch": 24.0,
666
+ "eval_loss": 1.0047398805618286,
667
+ "eval_runtime": 9.6214,
668
+ "eval_samples_per_second": 2.494,
669
+ "eval_steps_per_second": 2.494,
670
+ "step": 132
671
+ },
672
+ {
673
+ "epoch": 24.363636363636363,
674
+ "grad_norm": 0.34537094831466675,
675
+ "learning_rate": 9.812670772796113e-06,
676
+ "loss": 0.9699,
677
+ "step": 134
678
+ },
679
+ {
680
+ "epoch": 24.727272727272727,
681
+ "grad_norm": 0.38018471002578735,
682
+ "learning_rate": 9.799842074043438e-06,
683
+ "loss": 0.9773,
684
+ "step": 136
685
+ },
686
+ {
687
+ "epoch": 24.90909090909091,
688
+ "eval_loss": 0.9969633221626282,
689
+ "eval_runtime": 9.6246,
690
+ "eval_samples_per_second": 2.494,
691
+ "eval_steps_per_second": 2.494,
692
+ "step": 137
693
+ },
694
+ {
695
+ "epoch": 25.09090909090909,
696
+ "grad_norm": 0.40856873989105225,
697
+ "learning_rate": 9.786597487660336e-06,
698
+ "loss": 0.9836,
699
+ "step": 138
700
+ },
701
+ {
702
+ "epoch": 25.454545454545453,
703
+ "grad_norm": 0.37090280652046204,
704
+ "learning_rate": 9.77293816123866e-06,
705
+ "loss": 0.9658,
706
+ "step": 140
707
+ },
708
+ {
709
+ "epoch": 25.818181818181817,
710
+ "grad_norm": 0.4068634808063507,
711
+ "learning_rate": 9.75886527830587e-06,
712
+ "loss": 0.9708,
713
+ "step": 142
714
+ },
715
+ {
716
+ "epoch": 26.0,
717
+ "eval_loss": 0.9890053272247314,
718
+ "eval_runtime": 9.6268,
719
+ "eval_samples_per_second": 2.493,
720
+ "eval_steps_per_second": 2.493,
721
+ "step": 143
722
+ },
723
+ {
724
+ "epoch": 26.181818181818183,
725
+ "grad_norm": 0.38360726833343506,
726
+ "learning_rate": 9.744380058222483e-06,
727
+ "loss": 0.9676,
728
+ "step": 144
729
+ },
730
+ {
731
+ "epoch": 26.545454545454547,
732
+ "grad_norm": 0.38106632232666016,
733
+ "learning_rate": 9.729483756076436e-06,
734
+ "loss": 0.972,
735
+ "step": 146
736
+ },
737
+ {
738
+ "epoch": 26.90909090909091,
739
+ "grad_norm": 0.36939358711242676,
740
+ "learning_rate": 9.714177662574316e-06,
741
+ "loss": 0.9374,
742
+ "step": 148
743
+ },
744
+ {
745
+ "epoch": 26.90909090909091,
746
+ "eval_loss": 0.9821727275848389,
747
+ "eval_runtime": 9.622,
748
+ "eval_samples_per_second": 2.494,
749
+ "eval_steps_per_second": 2.494,
750
+ "step": 148
751
+ },
752
+ {
753
+ "epoch": 27.272727272727273,
754
+ "grad_norm": 0.37566348910331726,
755
+ "learning_rate": 9.698463103929542e-06,
756
+ "loss": 0.9219,
757
+ "step": 150
758
+ },
759
+ {
760
+ "epoch": 27.636363636363637,
761
+ "grad_norm": 0.3677101731300354,
762
+ "learning_rate": 9.682341441747446e-06,
763
+ "loss": 0.9798,
764
+ "step": 152
765
+ },
766
+ {
767
+ "epoch": 28.0,
768
+ "grad_norm": 0.3695693016052246,
769
+ "learning_rate": 9.665814072907293e-06,
770
+ "loss": 0.9403,
771
+ "step": 154
772
+ },
773
+ {
774
+ "epoch": 28.0,
775
+ "eval_loss": 0.9750909209251404,
776
+ "eval_runtime": 9.6244,
777
+ "eval_samples_per_second": 2.494,
778
+ "eval_steps_per_second": 2.494,
779
+ "step": 154
780
+ },
781
+ {
782
+ "epoch": 28.363636363636363,
783
+ "grad_norm": 0.42501190304756165,
784
+ "learning_rate": 9.648882429441258e-06,
785
+ "loss": 0.9428,
786
+ "step": 156
787
+ },
788
+ {
789
+ "epoch": 28.727272727272727,
790
+ "grad_norm": 0.3643590807914734,
791
+ "learning_rate": 9.63154797841033e-06,
792
+ "loss": 0.94,
793
+ "step": 158
794
+ },
795
+ {
796
+ "epoch": 28.90909090909091,
797
+ "eval_loss": 0.9702978134155273,
798
+ "eval_runtime": 9.6253,
799
+ "eval_samples_per_second": 2.493,
800
+ "eval_steps_per_second": 2.493,
801
+ "step": 159
802
+ },
803
+ {
804
+ "epoch": 29.09090909090909,
805
+ "grad_norm": 0.3957996666431427,
806
+ "learning_rate": 9.613812221777212e-06,
807
+ "loss": 0.9274,
808
+ "step": 160
809
+ },
810
+ {
811
+ "epoch": 29.454545454545453,
812
+ "grad_norm": 0.4291062355041504,
813
+ "learning_rate": 9.595676696276173e-06,
814
+ "loss": 0.9886,
815
+ "step": 162
816
+ },
817
+ {
818
+ "epoch": 29.818181818181817,
819
+ "grad_norm": 0.5365828275680542,
820
+ "learning_rate": 9.577142973279896e-06,
821
+ "loss": 0.902,
822
+ "step": 164
823
+ },
824
+ {
825
+ "epoch": 30.0,
826
+ "eval_loss": 0.9632946848869324,
827
+ "eval_runtime": 9.6187,
828
+ "eval_samples_per_second": 2.495,
829
+ "eval_steps_per_second": 2.495,
830
+ "step": 165
831
+ },
832
+ {
833
+ "epoch": 30.181818181818183,
834
+ "grad_norm": 0.38883283734321594,
835
+ "learning_rate": 9.55821265866333e-06,
836
+ "loss": 0.8967,
837
+ "step": 166
838
+ },
839
+ {
840
+ "epoch": 30.545454545454547,
841
+ "grad_norm": 0.41333243250846863,
842
+ "learning_rate": 9.538887392664544e-06,
843
+ "loss": 0.9142,
844
+ "step": 168
845
+ },
846
+ {
847
+ "epoch": 30.90909090909091,
848
+ "grad_norm": 0.4123990833759308,
849
+ "learning_rate": 9.519168849742603e-06,
850
+ "loss": 0.9215,
851
+ "step": 170
852
+ },
853
+ {
854
+ "epoch": 30.90909090909091,
855
+ "eval_loss": 0.9604056477546692,
856
+ "eval_runtime": 9.627,
857
+ "eval_samples_per_second": 2.493,
858
+ "eval_steps_per_second": 2.493,
859
+ "step": 170
860
+ },
861
+ {
862
+ "epoch": 31.272727272727273,
863
+ "grad_norm": 0.407969206571579,
864
+ "learning_rate": 9.499058738432492e-06,
865
+ "loss": 0.9574,
866
+ "step": 172
867
+ },
868
+ {
869
+ "epoch": 31.636363636363637,
870
+ "grad_norm": 0.4867004156112671,
871
+ "learning_rate": 9.478558801197065e-06,
872
+ "loss": 0.9208,
873
+ "step": 174
874
+ },
875
+ {
876
+ "epoch": 32.0,
877
+ "grad_norm": 0.4684889316558838,
878
+ "learning_rate": 9.457670814276083e-06,
879
+ "loss": 0.8854,
880
+ "step": 176
881
+ },
882
+ {
883
+ "epoch": 32.0,
884
+ "eval_loss": 0.9548270106315613,
885
+ "eval_runtime": 9.6173,
886
+ "eval_samples_per_second": 2.495,
887
+ "eval_steps_per_second": 2.495,
888
+ "step": 176
889
+ },
890
+ {
891
+ "epoch": 32.36363636363637,
892
+ "grad_norm": 0.527631938457489,
893
+ "learning_rate": 9.436396587532297e-06,
894
+ "loss": 0.8831,
895
+ "step": 178
896
+ },
897
+ {
898
+ "epoch": 32.72727272727273,
899
+ "grad_norm": 0.44928282499313354,
900
+ "learning_rate": 9.414737964294636e-06,
901
+ "loss": 0.96,
902
+ "step": 180
903
+ },
904
+ {
905
+ "epoch": 32.90909090909091,
906
+ "eval_loss": 0.9503173232078552,
907
+ "eval_runtime": 9.624,
908
+ "eval_samples_per_second": 2.494,
909
+ "eval_steps_per_second": 2.494,
910
+ "step": 181
911
+ },
912
+ {
913
+ "epoch": 33.09090909090909,
914
+ "grad_norm": 0.5621985197067261,
915
+ "learning_rate": 9.392696821198488e-06,
916
+ "loss": 0.8666,
917
+ "step": 182
918
+ },
919
+ {
920
+ "epoch": 33.45454545454545,
921
+ "grad_norm": 0.523452877998352,
922
+ "learning_rate": 9.370275068023097e-06,
923
+ "loss": 0.922,
924
+ "step": 184
925
+ },
926
+ {
927
+ "epoch": 33.81818181818182,
928
+ "grad_norm": 0.5437294840812683,
929
+ "learning_rate": 9.347474647526095e-06,
930
+ "loss": 0.9162,
931
+ "step": 186
932
+ },
933
+ {
934
+ "epoch": 34.0,
935
+ "eval_loss": 0.9452812075614929,
936
+ "eval_runtime": 9.6313,
937
+ "eval_samples_per_second": 2.492,
938
+ "eval_steps_per_second": 2.492,
939
+ "step": 187
940
+ },
941
+ {
942
+ "epoch": 34.18181818181818,
943
+ "grad_norm": 0.46963879466056824,
944
+ "learning_rate": 9.324297535275156e-06,
945
+ "loss": 0.8254,
946
+ "step": 188
947
+ },
948
+ {
949
+ "epoch": 34.54545454545455,
950
+ "grad_norm": 0.48245498538017273,
951
+ "learning_rate": 9.30074573947683e-06,
952
+ "loss": 0.9174,
953
+ "step": 190
954
+ },
955
+ {
956
+ "epoch": 34.90909090909091,
957
+ "grad_norm": 0.5139335989952087,
958
+ "learning_rate": 9.276821300802535e-06,
959
+ "loss": 0.8686,
960
+ "step": 192
961
+ },
962
+ {
963
+ "epoch": 34.90909090909091,
964
+ "eval_loss": 0.9428532719612122,
965
+ "eval_runtime": 9.6309,
966
+ "eval_samples_per_second": 2.492,
967
+ "eval_steps_per_second": 2.492,
968
+ "step": 192
969
+ },
970
+ {
971
+ "epoch": 35.27272727272727,
972
+ "grad_norm": 0.45418813824653625,
973
+ "learning_rate": 9.25252629221175e-06,
974
+ "loss": 0.9011,
975
+ "step": 194
976
+ },
977
+ {
978
+ "epoch": 35.63636363636363,
979
+ "grad_norm": 0.5155036449432373,
980
+ "learning_rate": 9.227862818772392e-06,
981
+ "loss": 0.8754,
982
+ "step": 196
983
+ },
984
+ {
985
+ "epoch": 36.0,
986
+ "grad_norm": 0.4917118549346924,
987
+ "learning_rate": 9.202833017478421e-06,
988
+ "loss": 0.906,
989
+ "step": 198
990
+ },
991
+ {
992
+ "epoch": 36.0,
993
+ "eval_loss": 0.9385306239128113,
994
+ "eval_runtime": 9.6304,
995
+ "eval_samples_per_second": 2.492,
996
+ "eval_steps_per_second": 2.492,
997
+ "step": 198
998
+ },
999
+ {
1000
+ "epoch": 36.36363636363637,
1001
+ "grad_norm": 0.5289394855499268,
1002
+ "learning_rate": 9.177439057064684e-06,
1003
+ "loss": 0.8751,
1004
+ "step": 200
1005
+ },
1006
+ {
1007
+ "epoch": 36.72727272727273,
1008
+ "grad_norm": 0.5498368144035339,
1009
+ "learning_rate": 9.151683137818989e-06,
1010
+ "loss": 0.8762,
1011
+ "step": 202
1012
+ },
1013
+ {
1014
+ "epoch": 36.90909090909091,
1015
+ "eval_loss": 0.9353806972503662,
1016
+ "eval_runtime": 9.6269,
1017
+ "eval_samples_per_second": 2.493,
1018
+ "eval_steps_per_second": 2.493,
1019
+ "step": 203
1020
+ },
1021
+ {
1022
+ "epoch": 37.09090909090909,
1023
+ "grad_norm": 0.516069233417511,
1024
+ "learning_rate": 9.125567491391476e-06,
1025
+ "loss": 0.869,
1026
+ "step": 204
1027
+ },
1028
+ {
1029
+ "epoch": 37.45454545454545,
1030
+ "grad_norm": 0.5102888345718384,
1031
+ "learning_rate": 9.099094380601244e-06,
1032
+ "loss": 0.8518,
1033
+ "step": 206
1034
+ },
1035
+ {
1036
+ "epoch": 37.81818181818182,
1037
+ "grad_norm": 0.5379929542541504,
1038
+ "learning_rate": 9.072266099240286e-06,
1039
+ "loss": 0.8929,
1040
+ "step": 208
1041
+ },
1042
+ {
1043
+ "epoch": 38.0,
1044
+ "eval_loss": 0.9331977963447571,
1045
+ "eval_runtime": 9.6277,
1046
+ "eval_samples_per_second": 2.493,
1047
+ "eval_steps_per_second": 2.493,
1048
+ "step": 209
1049
+ },
1050
+ {
1051
+ "epoch": 38.18181818181818,
1052
+ "grad_norm": 0.6433578729629517,
1053
+ "learning_rate": 9.045084971874738e-06,
1054
+ "loss": 0.8756,
1055
+ "step": 210
1056
+ },
1057
+ {
1058
+ "epoch": 38.54545454545455,
1059
+ "grad_norm": 0.6186140179634094,
1060
+ "learning_rate": 9.017553353643479e-06,
1061
+ "loss": 0.8582,
1062
+ "step": 212
1063
+ },
1064
+ {
1065
+ "epoch": 38.90909090909091,
1066
+ "grad_norm": 0.608066976070404,
1067
+ "learning_rate": 8.989673630054044e-06,
1068
+ "loss": 0.8687,
1069
+ "step": 214
1070
+ },
1071
+ {
1072
+ "epoch": 38.90909090909091,
1073
+ "eval_loss": 0.9301042556762695,
1074
+ "eval_runtime": 9.6307,
1075
+ "eval_samples_per_second": 2.492,
1076
+ "eval_steps_per_second": 2.492,
1077
+ "step": 214
1078
+ },
1079
+ {
1080
+ "epoch": 39.27272727272727,
1081
+ "grad_norm": 0.6045626401901245,
1082
+ "learning_rate": 8.961448216775955e-06,
1083
+ "loss": 0.8119,
1084
+ "step": 216
1085
+ },
1086
+ {
1087
+ "epoch": 39.63636363636363,
1088
+ "grad_norm": 0.6160129308700562,
1089
+ "learning_rate": 8.932879559431392e-06,
1090
+ "loss": 0.8301,
1091
+ "step": 218
1092
+ },
1093
+ {
1094
+ "epoch": 40.0,
1095
+ "grad_norm": 0.6550566554069519,
1096
+ "learning_rate": 8.903970133383297e-06,
1097
+ "loss": 0.8933,
1098
+ "step": 220
1099
+ },
1100
+ {
1101
+ "epoch": 40.0,
1102
+ "eval_loss": 0.9279410243034363,
1103
+ "eval_runtime": 9.6288,
1104
+ "eval_samples_per_second": 2.493,
1105
+ "eval_steps_per_second": 2.493,
1106
+ "step": 220
1107
+ },
1108
+ {
1109
+ "epoch": 40.36363636363637,
1110
+ "grad_norm": 0.6415209770202637,
1111
+ "learning_rate": 8.874722443520898e-06,
1112
+ "loss": 0.8325,
1113
+ "step": 222
1114
+ },
1115
+ {
1116
+ "epoch": 40.72727272727273,
1117
+ "grad_norm": 0.6836015582084656,
1118
+ "learning_rate": 8.845139024042664e-06,
1119
+ "loss": 0.858,
1120
+ "step": 224
1121
+ },
1122
+ {
1123
+ "epoch": 40.90909090909091,
1124
+ "eval_loss": 0.9241297841072083,
1125
+ "eval_runtime": 9.6293,
1126
+ "eval_samples_per_second": 2.492,
1127
+ "eval_steps_per_second": 2.492,
1128
+ "step": 225
1129
+ },
1130
+ {
1131
+ "epoch": 41.09090909090909,
1132
+ "grad_norm": 0.6644122004508972,
1133
+ "learning_rate": 8.815222438236726e-06,
1134
+ "loss": 0.8649,
1135
+ "step": 226
1136
+ },
1137
+ {
1138
+ "epoch": 41.45454545454545,
1139
+ "grad_norm": 0.6619220972061157,
1140
+ "learning_rate": 8.784975278258783e-06,
1141
+ "loss": 0.8085,
1142
+ "step": 228
1143
+ },
1144
+ {
1145
+ "epoch": 41.81818181818182,
1146
+ "grad_norm": 0.6005414724349976,
1147
+ "learning_rate": 8.754400164907496e-06,
1148
+ "loss": 0.8481,
1149
+ "step": 230
1150
+ },
1151
+ {
1152
+ "epoch": 42.0,
1153
+ "eval_loss": 0.9222747683525085,
1154
+ "eval_runtime": 9.6426,
1155
+ "eval_samples_per_second": 2.489,
1156
+ "eval_steps_per_second": 2.489,
1157
+ "step": 231
1158
+ },
1159
+ {
1160
+ "epoch": 42.18181818181818,
1161
+ "grad_norm": 0.722902238368988,
1162
+ "learning_rate": 8.723499747397415e-06,
1163
+ "loss": 0.8578,
1164
+ "step": 232
1165
+ },
1166
+ {
1167
+ "epoch": 42.54545454545455,
1168
+ "grad_norm": 0.7436155080795288,
1169
+ "learning_rate": 8.692276703129421e-06,
1170
+ "loss": 0.7996,
1171
+ "step": 234
1172
+ },
1173
+ {
1174
+ "epoch": 42.90909090909091,
1175
+ "grad_norm": 0.6658902168273926,
1176
+ "learning_rate": 8.660733737458751e-06,
1177
+ "loss": 0.8228,
1178
+ "step": 236
1179
+ },
1180
+ {
1181
+ "epoch": 42.90909090909091,
1182
+ "eval_loss": 0.9217340350151062,
1183
+ "eval_runtime": 9.6277,
1184
+ "eval_samples_per_second": 2.493,
1185
+ "eval_steps_per_second": 2.493,
1186
+ "step": 236
1187
+ },
1188
+ {
1189
+ "epoch": 43.27272727272727,
1190
+ "grad_norm": 0.6352283358573914,
1191
+ "learning_rate": 8.628873583460593e-06,
1192
+ "loss": 0.8113,
1193
+ "step": 238
1194
+ },
1195
+ {
1196
+ "epoch": 43.63636363636363,
1197
+ "grad_norm": 1.0223489999771118,
1198
+ "learning_rate": 8.596699001693257e-06,
1199
+ "loss": 0.8149,
1200
+ "step": 240
1201
+ },
1202
+ {
1203
+ "epoch": 44.0,
1204
+ "grad_norm": 0.7334797978401184,
1205
+ "learning_rate": 8.564212779959003e-06,
1206
+ "loss": 0.8593,
1207
+ "step": 242
1208
+ },
1209
+ {
1210
+ "epoch": 44.0,
1211
+ "eval_loss": 0.9185922741889954,
1212
+ "eval_runtime": 9.6303,
1213
+ "eval_samples_per_second": 2.492,
1214
+ "eval_steps_per_second": 2.492,
1215
+ "step": 242
1216
+ },
1217
+ {
1218
+ "epoch": 44.36363636363637,
1219
+ "grad_norm": 0.7544272541999817,
1220
+ "learning_rate": 8.531417733062476e-06,
1221
+ "loss": 0.7958,
1222
+ "step": 244
1223
+ },
1224
+ {
1225
+ "epoch": 44.72727272727273,
1226
+ "grad_norm": 0.8189204335212708,
1227
+ "learning_rate": 8.498316702566828e-06,
1228
+ "loss": 0.8238,
1229
+ "step": 246
1230
+ },
1231
+ {
1232
+ "epoch": 44.90909090909091,
1233
+ "eval_loss": 0.9156233668327332,
1234
+ "eval_runtime": 9.6451,
1235
+ "eval_samples_per_second": 2.488,
1236
+ "eval_steps_per_second": 2.488,
1237
+ "step": 247
1238
+ },
1239
+ {
1240
+ "epoch": 45.09090909090909,
1241
+ "grad_norm": 0.6729193329811096,
1242
+ "learning_rate": 8.464912556547486e-06,
1243
+ "loss": 0.835,
1244
+ "step": 248
1245
+ },
1246
+ {
1247
+ "epoch": 45.45454545454545,
1248
+ "grad_norm": 0.6723213195800781,
1249
+ "learning_rate": 8.43120818934367e-06,
1250
+ "loss": 0.7991,
1251
+ "step": 250
1252
+ },
1253
+ {
1254
+ "epoch": 45.81818181818182,
1255
+ "grad_norm": 0.8917332887649536,
1256
+ "learning_rate": 8.397206521307584e-06,
1257
+ "loss": 0.8081,
1258
+ "step": 252
1259
+ },
1260
+ {
1261
+ "epoch": 46.0,
1262
+ "eval_loss": 0.9161267876625061,
1263
+ "eval_runtime": 9.6325,
1264
+ "eval_samples_per_second": 2.492,
1265
+ "eval_steps_per_second": 2.492,
1266
+ "step": 253
1267
+ },
1268
+ {
1269
+ "epoch": 46.18181818181818,
1270
+ "grad_norm": 0.7718498110771179,
1271
+ "learning_rate": 8.362910498551402e-06,
1272
+ "loss": 0.8071,
1273
+ "step": 254
1274
+ },
1275
+ {
1276
+ "epoch": 46.54545454545455,
1277
+ "grad_norm": 0.7421916127204895,
1278
+ "learning_rate": 8.328323092691985e-06,
1279
+ "loss": 0.7838,
1280
+ "step": 256
1281
+ },
1282
+ {
1283
+ "epoch": 46.90909090909091,
1284
+ "grad_norm": 0.775203287601471,
1285
+ "learning_rate": 8.293447300593402e-06,
1286
+ "loss": 0.8327,
1287
+ "step": 258
1288
+ },
1289
+ {
1290
+ "epoch": 46.90909090909091,
1291
+ "eval_loss": 0.912854015827179,
1292
+ "eval_runtime": 9.6464,
1293
+ "eval_samples_per_second": 2.488,
1294
+ "eval_steps_per_second": 2.488,
1295
+ "step": 258
1296
+ },
1297
+ {
1298
+ "epoch": 47.27272727272727,
1299
+ "grad_norm": 0.6994480490684509,
1300
+ "learning_rate": 8.258286144107277e-06,
1301
+ "loss": 0.7949,
1302
+ "step": 260
1303
+ },
1304
+ {
1305
+ "epoch": 47.63636363636363,
1306
+ "grad_norm": 0.8607519865036011,
1307
+ "learning_rate": 8.222842669810936e-06,
1308
+ "loss": 0.7794,
1309
+ "step": 262
1310
+ },
1311
+ {
1312
+ "epoch": 48.0,
1313
+ "grad_norm": 0.8172978758811951,
1314
+ "learning_rate": 8.18711994874345e-06,
1315
+ "loss": 0.8029,
1316
+ "step": 264
1317
+ },
1318
+ {
1319
+ "epoch": 48.0,
1320
+ "eval_loss": 0.9110000133514404,
1321
+ "eval_runtime": 9.6168,
1322
+ "eval_samples_per_second": 2.496,
1323
+ "eval_steps_per_second": 2.496,
1324
+ "step": 264
1325
+ },
1326
+ {
1327
+ "epoch": 48.36363636363637,
1328
+ "grad_norm": 0.8061463236808777,
1329
+ "learning_rate": 8.151121076139534e-06,
1330
+ "loss": 0.8099,
1331
+ "step": 266
1332
+ },
1333
+ {
1334
+ "epoch": 48.72727272727273,
1335
+ "grad_norm": 0.9735673069953918,
1336
+ "learning_rate": 8.11484917116136e-06,
1337
+ "loss": 0.7909,
1338
+ "step": 268
1339
+ },
1340
+ {
1341
+ "epoch": 48.90909090909091,
1342
+ "eval_loss": 0.9093864560127258,
1343
+ "eval_runtime": 9.6205,
1344
+ "eval_samples_per_second": 2.495,
1345
+ "eval_steps_per_second": 2.495,
1346
+ "step": 269
1347
+ },
1348
+ {
1349
+ "epoch": 49.09090909090909,
1350
+ "grad_norm": 0.8723132014274597,
1351
+ "learning_rate": 8.078307376628292e-06,
1352
+ "loss": 0.7628,
1353
+ "step": 270
1354
+ },
1355
+ {
1356
+ "epoch": 49.45454545454545,
1357
+ "grad_norm": 0.7607284188270569,
1358
+ "learning_rate": 8.041498858744572e-06,
1359
+ "loss": 0.7665,
1360
+ "step": 272
1361
+ },
1362
+ {
1363
+ "epoch": 49.81818181818182,
1364
+ "grad_norm": 0.8277180194854736,
1365
+ "learning_rate": 8.004426806824985e-06,
1366
+ "loss": 0.7826,
1367
+ "step": 274
1368
+ },
1369
+ {
1370
+ "epoch": 50.0,
1371
+ "eval_loss": 0.9079095721244812,
1372
+ "eval_runtime": 9.6142,
1373
+ "eval_samples_per_second": 2.496,
1374
+ "eval_steps_per_second": 2.496,
1375
+ "step": 275
1376
+ },
1377
+ {
1378
+ "epoch": 50.18181818181818,
1379
+ "grad_norm": 0.8411371111869812,
1380
+ "learning_rate": 7.967094433018508e-06,
1381
+ "loss": 0.7943,
1382
+ "step": 276
1383
+ },
1384
+ {
1385
+ "epoch": 50.54545454545455,
1386
+ "grad_norm": 0.834507167339325,
1387
+ "learning_rate": 7.929504972030003e-06,
1388
+ "loss": 0.7586,
1389
+ "step": 278
1390
+ },
1391
+ {
1392
+ "epoch": 50.90909090909091,
1393
+ "grad_norm": 1.0113625526428223,
1394
+ "learning_rate": 7.891661680839932e-06,
1395
+ "loss": 0.773,
1396
+ "step": 280
1397
+ },
1398
+ {
1399
+ "epoch": 50.90909090909091,
1400
+ "eval_loss": 0.9122073650360107,
1401
+ "eval_runtime": 9.6327,
1402
+ "eval_samples_per_second": 2.492,
1403
+ "eval_steps_per_second": 2.492,
1404
+ "step": 280
1405
+ },
1406
+ {
1407
+ "epoch": 51.27272727272727,
1408
+ "grad_norm": 0.8380469083786011,
1409
+ "learning_rate": 7.85356783842216e-06,
1410
+ "loss": 0.7737,
1411
+ "step": 282
1412
+ },
1413
+ {
1414
+ "epoch": 51.63636363636363,
1415
+ "grad_norm": 0.8534033894538879,
1416
+ "learning_rate": 7.815226745459831e-06,
1417
+ "loss": 0.7941,
1418
+ "step": 284
1419
+ },
1420
+ {
1421
+ "epoch": 52.0,
1422
+ "grad_norm": 0.8890909552574158,
1423
+ "learning_rate": 7.776641724059398e-06,
1424
+ "loss": 0.7377,
1425
+ "step": 286
1426
+ },
1427
+ {
1428
+ "epoch": 52.0,
1429
+ "eval_loss": 0.9077624678611755,
1430
+ "eval_runtime": 9.6175,
1431
+ "eval_samples_per_second": 2.495,
1432
+ "eval_steps_per_second": 2.495,
1433
+ "step": 286
1434
+ },
1435
+ {
1436
+ "epoch": 52.36363636363637,
1437
+ "grad_norm": 0.9524686336517334,
1438
+ "learning_rate": 7.737816117462752e-06,
1439
+ "loss": 0.7699,
1440
+ "step": 288
1441
+ },
1442
+ {
1443
+ "epoch": 52.72727272727273,
1444
+ "grad_norm": 0.8625631928443909,
1445
+ "learning_rate": 7.698753289757565e-06,
1446
+ "loss": 0.7491,
1447
+ "step": 290
1448
+ },
1449
+ {
1450
+ "epoch": 52.90909090909091,
1451
+ "eval_loss": 0.9050046801567078,
1452
+ "eval_runtime": 9.6225,
1453
+ "eval_samples_per_second": 2.494,
1454
+ "eval_steps_per_second": 2.494,
1455
+ "step": 291
1456
+ },
1457
+ {
1458
+ "epoch": 53.09090909090909,
1459
+ "grad_norm": 1.0375274419784546,
1460
+ "learning_rate": 7.65945662558579e-06,
1461
+ "loss": 0.7661,
1462
+ "step": 292
1463
+ },
1464
+ {
1465
+ "epoch": 53.45454545454545,
1466
+ "grad_norm": 0.8255937695503235,
1467
+ "learning_rate": 7.619929529850397e-06,
1468
+ "loss": 0.7606,
1469
+ "step": 294
1470
+ },
1471
+ {
1472
+ "epoch": 53.81818181818182,
1473
+ "grad_norm": 1.0094412565231323,
1474
+ "learning_rate": 7.580175427420358e-06,
1475
+ "loss": 0.7414,
1476
+ "step": 296
1477
+ },
1478
+ {
1479
+ "epoch": 54.0,
1480
+ "eval_loss": 0.9093080163002014,
1481
+ "eval_runtime": 9.6164,
1482
+ "eval_samples_per_second": 2.496,
1483
+ "eval_steps_per_second": 2.496,
1484
+ "step": 297
1485
+ },
1486
+ {
1487
+ "epoch": 54.18181818181818,
1488
+ "grad_norm": 0.8360889554023743,
1489
+ "learning_rate": 7.54019776283389e-06,
1490
+ "loss": 0.7467,
1491
+ "step": 298
1492
+ },
1493
+ {
1494
+ "epoch": 54.54545454545455,
1495
+ "grad_norm": 0.9806857109069824,
1496
+ "learning_rate": 7.500000000000001e-06,
1497
+ "loss": 0.7445,
1498
+ "step": 300
1499
+ },
1500
+ {
1501
+ "epoch": 54.90909090909091,
1502
+ "grad_norm": 1.0003647804260254,
1503
+ "learning_rate": 7.459585621898353e-06,
1504
+ "loss": 0.7275,
1505
+ "step": 302
1506
+ },
1507
+ {
1508
+ "epoch": 54.90909090909091,
1509
+ "eval_loss": 0.9052907824516296,
1510
+ "eval_runtime": 9.6259,
1511
+ "eval_samples_per_second": 2.493,
1512
+ "eval_steps_per_second": 2.493,
1513
+ "step": 302
1514
+ },
1515
+ {
1516
+ "epoch": 55.27272727272727,
1517
+ "grad_norm": 1.080519437789917,
1518
+ "learning_rate": 7.418958130277483e-06,
1519
+ "loss": 0.7526,
1520
+ "step": 304
1521
+ },
1522
+ {
1523
+ "epoch": 55.63636363636363,
1524
+ "grad_norm": 1.2365224361419678,
1525
+ "learning_rate": 7.378121045351378e-06,
1526
+ "loss": 0.7289,
1527
+ "step": 306
1528
+ },
1529
+ {
1530
+ "epoch": 56.0,
1531
+ "grad_norm": 0.9447788000106812,
1532
+ "learning_rate": 7.337077905494472e-06,
1533
+ "loss": 0.7198,
1534
+ "step": 308
1535
+ },
1536
+ {
1537
+ "epoch": 56.0,
1538
+ "eval_loss": 0.9046055674552917,
1539
+ "eval_runtime": 9.6309,
1540
+ "eval_samples_per_second": 2.492,
1541
+ "eval_steps_per_second": 2.492,
1542
+ "step": 308
1543
+ },
1544
+ {
1545
+ "epoch": 56.36363636363637,
1546
+ "grad_norm": 0.9524107575416565,
1547
+ "learning_rate": 7.295832266935059e-06,
1548
+ "loss": 0.766,
1549
+ "step": 310
1550
+ },
1551
+ {
1552
+ "epoch": 56.72727272727273,
1553
+ "grad_norm": 0.9705169796943665,
1554
+ "learning_rate": 7.254387703447154e-06,
1555
+ "loss": 0.7203,
1556
+ "step": 312
1557
+ },
1558
+ {
1559
+ "epoch": 56.90909090909091,
1560
+ "eval_loss": 0.9092791676521301,
1561
+ "eval_runtime": 9.6325,
1562
+ "eval_samples_per_second": 2.492,
1563
+ "eval_steps_per_second": 2.492,
1564
+ "step": 313
1565
+ },
1566
+ {
1567
+ "epoch": 57.09090909090909,
1568
+ "grad_norm": 1.0789105892181396,
1569
+ "learning_rate": 7.212747806040845e-06,
1570
+ "loss": 0.6951,
1571
+ "step": 314
1572
+ },
1573
+ {
1574
+ "epoch": 57.45454545454545,
1575
+ "grad_norm": 1.1204413175582886,
1576
+ "learning_rate": 7.170916182651141e-06,
1577
+ "loss": 0.7249,
1578
+ "step": 316
1579
+ },
1580
+ {
1581
+ "epoch": 57.81818181818182,
1582
+ "grad_norm": 1.0801540613174438,
1583
+ "learning_rate": 7.128896457825364e-06,
1584
+ "loss": 0.6903,
1585
+ "step": 318
1586
+ },
1587
+ {
1588
+ "epoch": 58.0,
1589
+ "eval_loss": 0.9042022824287415,
1590
+ "eval_runtime": 9.621,
1591
+ "eval_samples_per_second": 2.495,
1592
+ "eval_steps_per_second": 2.495,
1593
+ "step": 319
1594
+ },
1595
+ {
1596
+ "epoch": 58.18181818181818,
1597
+ "grad_norm": 1.052799105644226,
1598
+ "learning_rate": 7.08669227240909e-06,
1599
+ "loss": 0.7306,
1600
+ "step": 320
1601
+ },
1602
+ {
1603
+ "epoch": 58.54545454545455,
1604
+ "grad_norm": 1.020494818687439,
1605
+ "learning_rate": 7.04430728323069e-06,
1606
+ "loss": 0.7288,
1607
+ "step": 322
1608
+ },
1609
+ {
1610
+ "epoch": 58.90909090909091,
1611
+ "grad_norm": 1.0670812129974365,
1612
+ "learning_rate": 7.0017451627844765e-06,
1613
+ "loss": 0.6987,
1614
+ "step": 324
1615
+ },
1616
+ {
1617
+ "epoch": 58.90909090909091,
1618
+ "eval_loss": 0.9106718897819519,
1619
+ "eval_runtime": 9.6324,
1620
+ "eval_samples_per_second": 2.492,
1621
+ "eval_steps_per_second": 2.492,
1622
+ "step": 324
1623
+ },
1624
+ {
1625
+ "epoch": 59.27272727272727,
1626
+ "grad_norm": 1.0681415796279907,
1627
+ "learning_rate": 6.959009598912493e-06,
1628
+ "loss": 0.6906,
1629
+ "step": 326
1630
+ },
1631
+ {
1632
+ "epoch": 59.63636363636363,
1633
+ "grad_norm": 1.1053000688552856,
1634
+ "learning_rate": 6.916104294484988e-06,
1635
+ "loss": 0.7063,
1636
+ "step": 328
1637
+ },
1638
+ {
1639
+ "epoch": 60.0,
1640
+ "grad_norm": 1.0191996097564697,
1641
+ "learning_rate": 6.873032967079562e-06,
1642
+ "loss": 0.7141,
1643
+ "step": 330
1644
+ },
1645
+ {
1646
+ "epoch": 60.0,
1647
+ "eval_loss": 0.9078884124755859,
1648
+ "eval_runtime": 9.6326,
1649
+ "eval_samples_per_second": 2.492,
1650
+ "eval_steps_per_second": 2.492,
1651
+ "step": 330
1652
+ },
1653
+ {
1654
+ "epoch": 60.36363636363637,
1655
+ "grad_norm": 1.0764459371566772,
1656
+ "learning_rate": 6.829799348659061e-06,
1657
+ "loss": 0.7079,
1658
+ "step": 332
1659
+ },
1660
+ {
1661
+ "epoch": 60.72727272727273,
1662
+ "grad_norm": 1.146618366241455,
1663
+ "learning_rate": 6.7864071852482205e-06,
1664
+ "loss": 0.7023,
1665
+ "step": 334
1666
+ },
1667
+ {
1668
+ "epoch": 60.90909090909091,
1669
+ "eval_loss": 0.9119828343391418,
1670
+ "eval_runtime": 9.6211,
1671
+ "eval_samples_per_second": 2.495,
1672
+ "eval_steps_per_second": 2.495,
1673
+ "step": 335
1674
+ },
1675
+ {
1676
+ "epoch": 61.09090909090909,
1677
+ "grad_norm": 1.274398684501648,
1678
+ "learning_rate": 6.7428602366090764e-06,
1679
+ "loss": 0.6856,
1680
+ "step": 336
1681
+ },
1682
+ {
1683
+ "epoch": 61.45454545454545,
1684
+ "grad_norm": 1.1239506006240845,
1685
+ "learning_rate": 6.699162275915208e-06,
1686
+ "loss": 0.6603,
1687
+ "step": 338
1688
+ },
1689
+ {
1690
+ "epoch": 61.81818181818182,
1691
+ "grad_norm": 1.3075493574142456,
1692
+ "learning_rate": 6.655317089424791e-06,
1693
+ "loss": 0.6945,
1694
+ "step": 340
1695
+ },
1696
+ {
1697
+ "epoch": 62.0,
1698
+ "eval_loss": 0.9086711406707764,
1699
+ "eval_runtime": 9.6164,
1700
+ "eval_samples_per_second": 2.496,
1701
+ "eval_steps_per_second": 2.496,
1702
+ "step": 341
1703
+ },
1704
+ {
1705
+ "epoch": 62.18181818181818,
1706
+ "grad_norm": 1.1385191679000854,
1707
+ "learning_rate": 6.611328476152557e-06,
1708
+ "loss": 0.7058,
1709
+ "step": 342
1710
+ },
1711
+ {
1712
+ "epoch": 62.54545454545455,
1713
+ "grad_norm": 1.2736291885375977,
1714
+ "learning_rate": 6.567200247540599e-06,
1715
+ "loss": 0.6662,
1716
+ "step": 344
1717
+ },
1718
+ {
1719
+ "epoch": 62.90909090909091,
1720
+ "grad_norm": 1.1537060737609863,
1721
+ "learning_rate": 6.522936227128139e-06,
1722
+ "loss": 0.6897,
1723
+ "step": 346
1724
+ },
1725
+ {
1726
+ "epoch": 62.90909090909091,
1727
+ "eval_loss": 0.9129719734191895,
1728
+ "eval_runtime": 9.6291,
1729
+ "eval_samples_per_second": 2.492,
1730
+ "eval_steps_per_second": 2.492,
1731
+ "step": 346
1732
+ },
1733
+ {
1734
+ "epoch": 63.27272727272727,
1735
+ "grad_norm": 1.4773812294006348,
1736
+ "learning_rate": 6.4785402502202345e-06,
1737
+ "loss": 0.6822,
1738
+ "step": 348
1739
+ },
1740
+ {
1741
+ "epoch": 63.63636363636363,
1742
+ "grad_norm": 1.3053030967712402,
1743
+ "learning_rate": 6.434016163555452e-06,
1744
+ "loss": 0.6596,
1745
+ "step": 350
1746
+ },
1747
+ {
1748
+ "epoch": 64.0,
1749
+ "grad_norm": 1.1616432666778564,
1750
+ "learning_rate": 6.389367824972575e-06,
1751
+ "loss": 0.6597,
1752
+ "step": 352
1753
+ },
1754
+ {
1755
+ "epoch": 64.0,
1756
+ "eval_loss": 0.9133894443511963,
1757
+ "eval_runtime": 9.6317,
1758
+ "eval_samples_per_second": 2.492,
1759
+ "eval_steps_per_second": 2.492,
1760
+ "step": 352
1761
+ },
1762
+ {
1763
+ "epoch": 64.36363636363636,
1764
+ "grad_norm": 1.0900788307189941,
1765
+ "learning_rate": 6.344599103076329e-06,
1766
+ "loss": 0.6563,
1767
+ "step": 354
1768
+ },
1769
+ {
1770
+ "epoch": 64.72727272727273,
1771
+ "grad_norm": 1.2286537885665894,
1772
+ "learning_rate": 6.299713876902188e-06,
1773
+ "loss": 0.6954,
1774
+ "step": 356
1775
+ },
1776
+ {
1777
+ "epoch": 64.9090909090909,
1778
+ "eval_loss": 0.9120491147041321,
1779
+ "eval_runtime": 9.6347,
1780
+ "eval_samples_per_second": 2.491,
1781
+ "eval_steps_per_second": 2.491,
1782
+ "step": 357
1783
+ },
1784
+ {
1785
+ "epoch": 64.9090909090909,
1786
+ "step": 357,
1787
+ "total_flos": 8.780093794235187e+16,
1788
+ "train_loss": 1.1563805774146436,
1789
+ "train_runtime": 6696.4639,
1790
+ "train_samples_per_second": 1.971,
1791
+ "train_steps_per_second": 0.112
1792
+ }
1793
+ ],
1794
+ "logging_steps": 2,
1795
+ "max_steps": 750,
1796
+ "num_input_tokens_seen": 0,
1797
+ "num_train_epochs": 150,
1798
+ "save_steps": 25,
1799
+ "stateful_callbacks": {
1800
+ "EarlyStoppingCallback": {
1801
+ "args": {
1802
+ "early_stopping_patience": 7,
1803
+ "early_stopping_threshold": 0.0
1804
+ },
1805
+ "attributes": {
1806
+ "early_stopping_patience_counter": 0
1807
+ }
1808
+ },
1809
+ "TrainerControl": {
1810
+ "args": {
1811
+ "should_epoch_stop": false,
1812
+ "should_evaluate": false,
1813
+ "should_log": false,
1814
+ "should_save": true,
1815
+ "should_training_stop": true
1816
+ },
1817
+ "attributes": {}
1818
+ }
1819
+ },
1820
+ "total_flos": 8.780093794235187e+16,
1821
+ "train_batch_size": 1,
1822
+ "trial_name": null,
1823
+ "trial_params": null
1824
+ }