lombardata commited on
Commit
5726e4e
1 Parent(s): 7bcab23

Evaluation on the test set completed on 2024_11_27.

Browse files
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-large
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: bd_ortho-DinoVdeau-large-2024_11_27-batch-size64_freeze_probs
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # bd_ortho-DinoVdeau-large-2024_11_27-batch-size64_freeze_probs
15
+
16
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.4551
19
+ - Rmse: 0.0866
20
+ - Mae: 0.0630
21
+ - Kl Divergence: 0.1147
22
+ - Explained Variance: 0.6593
23
+ - Learning Rate: 0.0000
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 0.001
43
+ - train_batch_size: 64
44
+ - eval_batch_size: 64
45
+ - seed: 42
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - num_epochs: 150
49
+ - mixed_precision_training: Native AMP
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Rmse | Mae | Kl Divergence | Explained Variance | Rate |
54
+ |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|:-------------:|:------------------:|:------:|
55
+ | No log | 1.0 | 221 | 0.4634 | 0.1018 | 0.0760 | 0.0696 | 0.5492 | 0.001 |
56
+ | No log | 2.0 | 442 | 0.4593 | 0.0952 | 0.0716 | 0.0038 | 0.6113 | 0.001 |
57
+ | 0.5185 | 3.0 | 663 | 0.4574 | 0.0918 | 0.0670 | 0.0583 | 0.6245 | 0.001 |
58
+ | 0.5185 | 4.0 | 884 | 0.4595 | 0.0955 | 0.0713 | -0.0650 | 0.6130 | 0.001 |
59
+ | 0.4806 | 5.0 | 1105 | 0.4593 | 0.0954 | 0.0702 | -0.0835 | 0.6206 | 0.001 |
60
+ | 0.4806 | 6.0 | 1326 | 0.4608 | 0.0977 | 0.0728 | -0.0705 | 0.6041 | 0.001 |
61
+ | 0.4786 | 7.0 | 1547 | 0.4581 | 0.0927 | 0.0683 | -0.0044 | 0.6283 | 0.001 |
62
+ | 0.4786 | 8.0 | 1768 | 0.4573 | 0.0916 | 0.0680 | 0.0799 | 0.6277 | 0.001 |
63
+ | 0.4786 | 9.0 | 1989 | 0.4594 | 0.0947 | 0.0706 | 0.0233 | 0.6196 | 0.001 |
64
+ | 0.4776 | 10.0 | 2210 | 0.4577 | 0.0918 | 0.0675 | 0.0885 | 0.6293 | 0.001 |
65
+ | 0.4776 | 11.0 | 2431 | 0.4564 | 0.0898 | 0.0662 | 0.1296 | 0.6422 | 0.001 |
66
+ | 0.4772 | 12.0 | 2652 | 0.4572 | 0.0913 | 0.0677 | -0.0061 | 0.6386 | 0.001 |
67
+ | 0.4772 | 13.0 | 2873 | 0.4623 | 0.1002 | 0.0747 | -0.2060 | 0.6186 | 0.001 |
68
+ | 0.4769 | 14.0 | 3094 | 0.4578 | 0.0925 | 0.0678 | -0.0371 | 0.6346 | 0.001 |
69
+ | 0.4769 | 15.0 | 3315 | 0.4575 | 0.0917 | 0.0667 | 0.0458 | 0.6340 | 0.001 |
70
+ | 0.4766 | 16.0 | 3536 | 0.4579 | 0.0926 | 0.0680 | 0.0151 | 0.6277 | 0.001 |
71
+ | 0.4766 | 17.0 | 3757 | 0.4592 | 0.0949 | 0.0702 | -0.0679 | 0.6246 | 0.001 |
72
+ | 0.4766 | 18.0 | 3978 | 0.4557 | 0.0887 | 0.0651 | 0.0421 | 0.6493 | 0.0001 |
73
+ | 0.4758 | 19.0 | 4199 | 0.4556 | 0.0885 | 0.0647 | 0.0468 | 0.6508 | 0.0001 |
74
+ | 0.4758 | 20.0 | 4420 | 0.4555 | 0.0884 | 0.0648 | 0.0405 | 0.6518 | 0.0001 |
75
+ | 0.4741 | 21.0 | 4641 | 0.4555 | 0.0884 | 0.0650 | 0.0475 | 0.6533 | 0.0001 |
76
+ | 0.4741 | 22.0 | 4862 | 0.4555 | 0.0883 | 0.0646 | 0.0570 | 0.6535 | 0.0001 |
77
+ | 0.4738 | 23.0 | 5083 | 0.4551 | 0.0874 | 0.0641 | 0.0887 | 0.6570 | 0.0001 |
78
+ | 0.4738 | 24.0 | 5304 | 0.4552 | 0.0878 | 0.0642 | 0.0555 | 0.6553 | 0.0001 |
79
+ | 0.4736 | 25.0 | 5525 | 0.4552 | 0.0878 | 0.0645 | 0.0238 | 0.6582 | 0.0001 |
80
+ | 0.4736 | 26.0 | 5746 | 0.4557 | 0.0885 | 0.0646 | 0.0409 | 0.6572 | 0.0001 |
81
+ | 0.4736 | 27.0 | 5967 | 0.4551 | 0.0876 | 0.0639 | 0.0548 | 0.6576 | 0.0001 |
82
+ | 0.4731 | 28.0 | 6188 | 0.4551 | 0.0876 | 0.0642 | 0.0273 | 0.6588 | 0.0001 |
83
+ | 0.4731 | 29.0 | 6409 | 0.4548 | 0.0869 | 0.0634 | 0.0744 | 0.6618 | 0.0001 |
84
+ | 0.4727 | 30.0 | 6630 | 0.4549 | 0.0873 | 0.0636 | 0.0492 | 0.6595 | 0.0001 |
85
+ | 0.4727 | 31.0 | 6851 | 0.4548 | 0.0869 | 0.0632 | 0.0688 | 0.6613 | 0.0001 |
86
+ | 0.4732 | 32.0 | 7072 | 0.4550 | 0.0874 | 0.0639 | 0.0271 | 0.6602 | 0.0001 |
87
+ | 0.4732 | 33.0 | 7293 | 0.4554 | 0.0882 | 0.0647 | -0.0174 | 0.6580 | 0.0001 |
88
+ | 0.4725 | 34.0 | 7514 | 0.4546 | 0.0866 | 0.0628 | 0.1094 | 0.6616 | 0.0001 |
89
+ | 0.4725 | 35.0 | 7735 | 0.4550 | 0.0874 | 0.0639 | 0.0571 | 0.6583 | 0.0001 |
90
+ | 0.4725 | 36.0 | 7956 | 0.4548 | 0.0869 | 0.0629 | 0.1453 | 0.6616 | 0.0001 |
91
+ | 0.4727 | 37.0 | 8177 | 0.4553 | 0.0881 | 0.0645 | -0.0152 | 0.6587 | 0.0001 |
92
+ | 0.4727 | 38.0 | 8398 | 0.4548 | 0.0870 | 0.0636 | 0.0490 | 0.6613 | 0.0001 |
93
+ | 0.4727 | 39.0 | 8619 | 0.4548 | 0.0870 | 0.0631 | 0.0726 | 0.6610 | 0.0001 |
94
+ | 0.4727 | 40.0 | 8840 | 0.4548 | 0.0870 | 0.0632 | 0.0637 | 0.6605 | 0.0001 |
95
+ | 0.4721 | 41.0 | 9061 | 0.4547 | 0.0869 | 0.0634 | 0.0390 | 0.6628 | 1e-05 |
96
+ | 0.4721 | 42.0 | 9282 | 0.4544 | 0.0862 | 0.0628 | 0.1115 | 0.6657 | 1e-05 |
97
+ | 0.4721 | 43.0 | 9503 | 0.4546 | 0.0866 | 0.0632 | 0.0533 | 0.6646 | 1e-05 |
98
+ | 0.4721 | 44.0 | 9724 | 0.4545 | 0.0864 | 0.0625 | 0.1350 | 0.6648 | 1e-05 |
99
+ | 0.4721 | 45.0 | 9945 | 0.4550 | 0.0874 | 0.0642 | 0.0044 | 0.6625 | 1e-05 |
100
+ | 0.4716 | 46.0 | 10166 | 0.4546 | 0.0867 | 0.0632 | 0.0389 | 0.6642 | 1e-05 |
101
+ | 0.4716 | 47.0 | 10387 | 0.4545 | 0.0866 | 0.0630 | 0.0370 | 0.6651 | 1e-05 |
102
+ | 0.4722 | 48.0 | 10608 | 0.4546 | 0.0868 | 0.0634 | 0.0194 | 0.6645 | 1e-05 |
103
+ | 0.4722 | 49.0 | 10829 | 0.4544 | 0.0862 | 0.0627 | 0.0667 | 0.6667 | 0.0000 |
104
+ | 0.4717 | 50.0 | 11050 | 0.4545 | 0.0865 | 0.0631 | 0.0548 | 0.6651 | 0.0000 |
105
+ | 0.4717 | 51.0 | 11271 | 0.4545 | 0.0865 | 0.0629 | 0.0428 | 0.6651 | 0.0000 |
106
+ | 0.4717 | 52.0 | 11492 | 0.4542 | 0.0859 | 0.0623 | 0.1236 | 0.6672 | 0.0000 |
107
+ | 0.4718 | 53.0 | 11713 | 0.4542 | 0.0859 | 0.0625 | 0.0887 | 0.6672 | 0.0000 |
108
+ | 0.4718 | 54.0 | 11934 | 0.4543 | 0.0862 | 0.0624 | 0.0917 | 0.6653 | 0.0000 |
109
+ | 0.4716 | 55.0 | 12155 | 0.4546 | 0.0865 | 0.0631 | 0.0774 | 0.6650 | 0.0000 |
110
+ | 0.4716 | 56.0 | 12376 | 0.4546 | 0.0866 | 0.0633 | 0.0473 | 0.6649 | 0.0000 |
111
+ | 0.4717 | 57.0 | 12597 | 0.4549 | 0.0871 | 0.0639 | -0.0046 | 0.6658 | 0.0000 |
112
+ | 0.4717 | 58.0 | 12818 | 0.4544 | 0.0864 | 0.0627 | 0.0553 | 0.6656 | 0.0000 |
113
+ | 0.4716 | 59.0 | 13039 | 0.4545 | 0.0865 | 0.0631 | 0.0368 | 0.6654 | 0.0000 |
114
+ | 0.4716 | 60.0 | 13260 | 0.4544 | 0.0863 | 0.0629 | 0.0471 | 0.6660 | 0.0000 |
115
+ | 0.4716 | 61.0 | 13481 | 0.4542 | 0.0860 | 0.0624 | 0.0928 | 0.6670 | 0.0000 |
116
+ | 0.4718 | 62.0 | 13702 | 0.4545 | 0.0866 | 0.0632 | 0.0286 | 0.6661 | 0.0000 |
117
+
118
+
119
+ ### Framework versions
120
+
121
+ - Transformers 4.41.0
122
+ - Pytorch 2.5.0+cu124
123
+ - Datasets 3.0.2
124
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 62.0,
3
+ "eval_explained_variance": 0.6593042016029358,
4
+ "eval_kl_divergence": 0.11466515809297562,
5
+ "eval_loss": 0.45506975054740906,
6
+ "eval_mae": 0.06304711848497391,
7
+ "eval_rmse": 0.08664286881685257,
8
+ "eval_runtime": 26.2102,
9
+ "eval_samples_per_second": 179.244,
10
+ "eval_steps_per_second": 2.823,
11
+ "learning_rate": 1.0000000000000002e-07,
12
+ "total_flos": 9.42369297866869e+19,
13
+ "train_loss": 0.4754439868851833,
14
+ "train_runtime": 8961.4221,
15
+ "train_samples_per_second": 235.894,
16
+ "train_steps_per_second": 3.699
17
+ }
logs/events.out.tfevents.1732687817.datavisu2 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fcf2e8c6bf4c5558216d89ad51fd018388fc40a3c27de5a559ec45d35c76e8a2
3
- size 44274
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d78e0e3d21c1058cf86741492b3a2c5a15d05f99fa19a27103dbd49a075da25
3
+ size 45983
logs/events.out.tfevents.1732696814.datavisu2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3162ba9474c0885fdae82852ed9ddbce5e81310e1ac9741481981e97954f0960
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4b948f1fde3fc2d17d81e2b2ad872da78478793584196ea425a180bb4d83e5d8
3
  size 1222870688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3221228a8d421eb77f2d313bbc5460b2bf904a37a4260e2372f65d2ff35418ce
3
  size 1222870688
test_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 62.0,
3
+ "eval_explained_variance": 0.6593042016029358,
4
+ "eval_kl_divergence": 0.11466515809297562,
5
+ "eval_loss": 0.45506975054740906,
6
+ "eval_mae": 0.06304711848497391,
7
+ "eval_rmse": 0.08664286881685257,
8
+ "eval_runtime": 26.2102,
9
+ "eval_samples_per_second": 179.244,
10
+ "eval_steps_per_second": 2.823,
11
+ "learning_rate": 1.0000000000000002e-07
12
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 62.0,
3
+ "learning_rate": 1.0000000000000002e-07,
4
+ "total_flos": 9.42369297866869e+19,
5
+ "train_loss": 0.4754439868851833,
6
+ "train_runtime": 8961.4221,
7
+ "train_samples_per_second": 235.894,
8
+ "train_steps_per_second": 3.699
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1047 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.45421910285949707,
3
+ "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/bd_ortho_ign/bd_ortho-DinoVdeau-large-2024_11_27-batch-size64_freeze_probs/checkpoint-11492",
4
+ "epoch": 62.0,
5
+ "eval_steps": 500,
6
+ "global_step": 13702,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_explained_variance": 0.5492395758628845,
14
+ "eval_kl_divergence": 0.06964559853076935,
15
+ "eval_loss": 0.46336060762405396,
16
+ "eval_mae": 0.07600608468055725,
17
+ "eval_rmse": 0.10175278037786484,
18
+ "eval_runtime": 26.595,
19
+ "eval_samples_per_second": 176.65,
20
+ "eval_steps_per_second": 2.782,
21
+ "learning_rate": 0.001,
22
+ "step": 221
23
+ },
24
+ {
25
+ "epoch": 2.0,
26
+ "eval_explained_variance": 0.6113448739051819,
27
+ "eval_kl_divergence": 0.0038063330575823784,
28
+ "eval_loss": 0.45933997631073,
29
+ "eval_mae": 0.07159148901700974,
30
+ "eval_rmse": 0.09520163387060165,
31
+ "eval_runtime": 25.5426,
32
+ "eval_samples_per_second": 183.928,
33
+ "eval_steps_per_second": 2.897,
34
+ "learning_rate": 0.001,
35
+ "step": 442
36
+ },
37
+ {
38
+ "epoch": 2.262443438914027,
39
+ "grad_norm": 0.16188210248947144,
40
+ "learning_rate": 0.001,
41
+ "loss": 0.5185,
42
+ "step": 500
43
+ },
44
+ {
45
+ "epoch": 3.0,
46
+ "eval_explained_variance": 0.6245184540748596,
47
+ "eval_kl_divergence": 0.05826142057776451,
48
+ "eval_loss": 0.457367479801178,
49
+ "eval_mae": 0.0670078918337822,
50
+ "eval_rmse": 0.0917908325791359,
51
+ "eval_runtime": 25.6126,
52
+ "eval_samples_per_second": 183.425,
53
+ "eval_steps_per_second": 2.889,
54
+ "learning_rate": 0.001,
55
+ "step": 663
56
+ },
57
+ {
58
+ "epoch": 4.0,
59
+ "eval_explained_variance": 0.6129782795906067,
60
+ "eval_kl_divergence": -0.06495417654514313,
61
+ "eval_loss": 0.459468811750412,
62
+ "eval_mae": 0.07134346663951874,
63
+ "eval_rmse": 0.09552835673093796,
64
+ "eval_runtime": 25.6003,
65
+ "eval_samples_per_second": 183.514,
66
+ "eval_steps_per_second": 2.891,
67
+ "learning_rate": 0.001,
68
+ "step": 884
69
+ },
70
+ {
71
+ "epoch": 4.524886877828054,
72
+ "grad_norm": 0.09988280385732651,
73
+ "learning_rate": 0.001,
74
+ "loss": 0.4806,
75
+ "step": 1000
76
+ },
77
+ {
78
+ "epoch": 5.0,
79
+ "eval_explained_variance": 0.6206489205360413,
80
+ "eval_kl_divergence": -0.08347146958112717,
81
+ "eval_loss": 0.45927393436431885,
82
+ "eval_mae": 0.07016489654779434,
83
+ "eval_rmse": 0.0953657403588295,
84
+ "eval_runtime": 25.74,
85
+ "eval_samples_per_second": 182.518,
86
+ "eval_steps_per_second": 2.875,
87
+ "learning_rate": 0.001,
88
+ "step": 1105
89
+ },
90
+ {
91
+ "epoch": 6.0,
92
+ "eval_explained_variance": 0.6041414737701416,
93
+ "eval_kl_divergence": -0.07046143710613251,
94
+ "eval_loss": 0.46080395579338074,
95
+ "eval_mae": 0.07277411222457886,
96
+ "eval_rmse": 0.09773259609937668,
97
+ "eval_runtime": 25.4681,
98
+ "eval_samples_per_second": 184.466,
99
+ "eval_steps_per_second": 2.906,
100
+ "learning_rate": 0.001,
101
+ "step": 1326
102
+ },
103
+ {
104
+ "epoch": 6.787330316742081,
105
+ "grad_norm": 0.08271574974060059,
106
+ "learning_rate": 0.001,
107
+ "loss": 0.4786,
108
+ "step": 1500
109
+ },
110
+ {
111
+ "epoch": 7.0,
112
+ "eval_explained_variance": 0.628325879573822,
113
+ "eval_kl_divergence": -0.004442690871655941,
114
+ "eval_loss": 0.4581476151943207,
115
+ "eval_mae": 0.06827609241008759,
116
+ "eval_rmse": 0.09274852275848389,
117
+ "eval_runtime": 26.0251,
118
+ "eval_samples_per_second": 180.518,
119
+ "eval_steps_per_second": 2.843,
120
+ "learning_rate": 0.001,
121
+ "step": 1547
122
+ },
123
+ {
124
+ "epoch": 8.0,
125
+ "eval_explained_variance": 0.6276748776435852,
126
+ "eval_kl_divergence": 0.07988782227039337,
127
+ "eval_loss": 0.4573117196559906,
128
+ "eval_mae": 0.06800529360771179,
129
+ "eval_rmse": 0.09162522107362747,
130
+ "eval_runtime": 25.7197,
131
+ "eval_samples_per_second": 182.662,
132
+ "eval_steps_per_second": 2.877,
133
+ "learning_rate": 0.001,
134
+ "step": 1768
135
+ },
136
+ {
137
+ "epoch": 9.0,
138
+ "eval_explained_variance": 0.6196129322052002,
139
+ "eval_kl_divergence": 0.02327939122915268,
140
+ "eval_loss": 0.45939013361930847,
141
+ "eval_mae": 0.07057134807109833,
142
+ "eval_rmse": 0.09471722692251205,
143
+ "eval_runtime": 25.8299,
144
+ "eval_samples_per_second": 181.883,
145
+ "eval_steps_per_second": 2.865,
146
+ "learning_rate": 0.001,
147
+ "step": 1989
148
+ },
149
+ {
150
+ "epoch": 9.049773755656108,
151
+ "grad_norm": 0.05649600923061371,
152
+ "learning_rate": 0.001,
153
+ "loss": 0.4776,
154
+ "step": 2000
155
+ },
156
+ {
157
+ "epoch": 10.0,
158
+ "eval_explained_variance": 0.6293186545372009,
159
+ "eval_kl_divergence": 0.0885055735707283,
160
+ "eval_loss": 0.45772281289100647,
161
+ "eval_mae": 0.06745484471321106,
162
+ "eval_rmse": 0.09179002046585083,
163
+ "eval_runtime": 25.5273,
164
+ "eval_samples_per_second": 184.039,
165
+ "eval_steps_per_second": 2.899,
166
+ "learning_rate": 0.001,
167
+ "step": 2210
168
+ },
169
+ {
170
+ "epoch": 11.0,
171
+ "eval_explained_variance": 0.6422439813613892,
172
+ "eval_kl_divergence": 0.1296330839395523,
173
+ "eval_loss": 0.45641985535621643,
174
+ "eval_mae": 0.06617596000432968,
175
+ "eval_rmse": 0.08975591510534286,
176
+ "eval_runtime": 25.7282,
177
+ "eval_samples_per_second": 182.601,
178
+ "eval_steps_per_second": 2.876,
179
+ "learning_rate": 0.001,
180
+ "step": 2431
181
+ },
182
+ {
183
+ "epoch": 11.312217194570136,
184
+ "grad_norm": 0.04163961857557297,
185
+ "learning_rate": 0.001,
186
+ "loss": 0.4772,
187
+ "step": 2500
188
+ },
189
+ {
190
+ "epoch": 12.0,
191
+ "eval_explained_variance": 0.6385617256164551,
192
+ "eval_kl_divergence": -0.006057058461010456,
193
+ "eval_loss": 0.45718902349472046,
194
+ "eval_mae": 0.06766870617866516,
195
+ "eval_rmse": 0.09130751341581345,
196
+ "eval_runtime": 25.6849,
197
+ "eval_samples_per_second": 182.909,
198
+ "eval_steps_per_second": 2.881,
199
+ "learning_rate": 0.001,
200
+ "step": 2652
201
+ },
202
+ {
203
+ "epoch": 13.0,
204
+ "eval_explained_variance": 0.6186209321022034,
205
+ "eval_kl_divergence": -0.20600058138370514,
206
+ "eval_loss": 0.4622880220413208,
207
+ "eval_mae": 0.07468675822019577,
208
+ "eval_rmse": 0.10024455189704895,
209
+ "eval_runtime": 25.9645,
210
+ "eval_samples_per_second": 180.939,
211
+ "eval_steps_per_second": 2.85,
212
+ "learning_rate": 0.001,
213
+ "step": 2873
214
+ },
215
+ {
216
+ "epoch": 13.574660633484163,
217
+ "grad_norm": 0.0532899908721447,
218
+ "learning_rate": 0.001,
219
+ "loss": 0.4769,
220
+ "step": 3000
221
+ },
222
+ {
223
+ "epoch": 14.0,
224
+ "eval_explained_variance": 0.6346250176429749,
225
+ "eval_kl_divergence": -0.0371401272714138,
226
+ "eval_loss": 0.45775285363197327,
227
+ "eval_mae": 0.06778896600008011,
228
+ "eval_rmse": 0.092497818171978,
229
+ "eval_runtime": 25.7017,
230
+ "eval_samples_per_second": 182.79,
231
+ "eval_steps_per_second": 2.879,
232
+ "learning_rate": 0.001,
233
+ "step": 3094
234
+ },
235
+ {
236
+ "epoch": 15.0,
237
+ "eval_explained_variance": 0.6340083479881287,
238
+ "eval_kl_divergence": 0.04575105383992195,
239
+ "eval_loss": 0.4575214684009552,
240
+ "eval_mae": 0.0666513592004776,
241
+ "eval_rmse": 0.0916559174656868,
242
+ "eval_runtime": 26.025,
243
+ "eval_samples_per_second": 180.519,
244
+ "eval_steps_per_second": 2.843,
245
+ "learning_rate": 0.001,
246
+ "step": 3315
247
+ },
248
+ {
249
+ "epoch": 15.83710407239819,
250
+ "grad_norm": 0.0473792664706707,
251
+ "learning_rate": 0.001,
252
+ "loss": 0.4766,
253
+ "step": 3500
254
+ },
255
+ {
256
+ "epoch": 16.0,
257
+ "eval_explained_variance": 0.6277230381965637,
258
+ "eval_kl_divergence": 0.01510859839618206,
259
+ "eval_loss": 0.4578736424446106,
260
+ "eval_mae": 0.06800080835819244,
261
+ "eval_rmse": 0.09264300018548965,
262
+ "eval_runtime": 25.6671,
263
+ "eval_samples_per_second": 183.036,
264
+ "eval_steps_per_second": 2.883,
265
+ "learning_rate": 0.001,
266
+ "step": 3536
267
+ },
268
+ {
269
+ "epoch": 17.0,
270
+ "eval_explained_variance": 0.6246375441551208,
271
+ "eval_kl_divergence": -0.06794208288192749,
272
+ "eval_loss": 0.4592094421386719,
273
+ "eval_mae": 0.07020581513643265,
274
+ "eval_rmse": 0.09485668689012527,
275
+ "eval_runtime": 25.9387,
276
+ "eval_samples_per_second": 181.119,
277
+ "eval_steps_per_second": 2.853,
278
+ "learning_rate": 0.001,
279
+ "step": 3757
280
+ },
281
+ {
282
+ "epoch": 18.0,
283
+ "eval_explained_variance": 0.6493042707443237,
284
+ "eval_kl_divergence": 0.04208216443657875,
285
+ "eval_loss": 0.45573291182518005,
286
+ "eval_mae": 0.06506813317537308,
287
+ "eval_rmse": 0.08873652666807175,
288
+ "eval_runtime": 25.6229,
289
+ "eval_samples_per_second": 183.352,
290
+ "eval_steps_per_second": 2.888,
291
+ "learning_rate": 0.0001,
292
+ "step": 3978
293
+ },
294
+ {
295
+ "epoch": 18.099547511312217,
296
+ "grad_norm": 0.048517756164073944,
297
+ "learning_rate": 0.0001,
298
+ "loss": 0.4758,
299
+ "step": 4000
300
+ },
301
+ {
302
+ "epoch": 19.0,
303
+ "eval_explained_variance": 0.6507542729377747,
304
+ "eval_kl_divergence": 0.04677804559469223,
305
+ "eval_loss": 0.4555513262748718,
306
+ "eval_mae": 0.06473750621080399,
307
+ "eval_rmse": 0.08847790211439133,
308
+ "eval_runtime": 25.7638,
309
+ "eval_samples_per_second": 182.349,
310
+ "eval_steps_per_second": 2.872,
311
+ "learning_rate": 0.0001,
312
+ "step": 4199
313
+ },
314
+ {
315
+ "epoch": 20.0,
316
+ "eval_explained_variance": 0.6518434882164001,
317
+ "eval_kl_divergence": 0.0404924675822258,
318
+ "eval_loss": 0.45553284883499146,
319
+ "eval_mae": 0.06476090103387833,
320
+ "eval_rmse": 0.08838176727294922,
321
+ "eval_runtime": 25.6331,
322
+ "eval_samples_per_second": 183.279,
323
+ "eval_steps_per_second": 2.887,
324
+ "learning_rate": 0.0001,
325
+ "step": 4420
326
+ },
327
+ {
328
+ "epoch": 20.361990950226243,
329
+ "grad_norm": 0.04679996892809868,
330
+ "learning_rate": 0.0001,
331
+ "loss": 0.4741,
332
+ "step": 4500
333
+ },
334
+ {
335
+ "epoch": 21.0,
336
+ "eval_explained_variance": 0.6532743573188782,
337
+ "eval_kl_divergence": 0.047539714723825455,
338
+ "eval_loss": 0.4555487334728241,
339
+ "eval_mae": 0.06497333198785782,
340
+ "eval_rmse": 0.08836204558610916,
341
+ "eval_runtime": 25.803,
342
+ "eval_samples_per_second": 182.072,
343
+ "eval_steps_per_second": 2.868,
344
+ "learning_rate": 0.0001,
345
+ "step": 4641
346
+ },
347
+ {
348
+ "epoch": 22.0,
349
+ "eval_explained_variance": 0.6534684300422668,
350
+ "eval_kl_divergence": 0.0570099912583828,
351
+ "eval_loss": 0.45551028847694397,
352
+ "eval_mae": 0.06458985060453415,
353
+ "eval_rmse": 0.08831282705068588,
354
+ "eval_runtime": 25.9625,
355
+ "eval_samples_per_second": 180.953,
356
+ "eval_steps_per_second": 2.85,
357
+ "learning_rate": 0.0001,
358
+ "step": 4862
359
+ },
360
+ {
361
+ "epoch": 22.624434389140273,
362
+ "grad_norm": 0.05471302196383476,
363
+ "learning_rate": 0.0001,
364
+ "loss": 0.4738,
365
+ "step": 5000
366
+ },
367
+ {
368
+ "epoch": 23.0,
369
+ "eval_explained_variance": 0.6569964289665222,
370
+ "eval_kl_divergence": 0.08867427706718445,
371
+ "eval_loss": 0.45505577325820923,
372
+ "eval_mae": 0.0640987753868103,
373
+ "eval_rmse": 0.08740502595901489,
374
+ "eval_runtime": 25.8915,
375
+ "eval_samples_per_second": 181.45,
376
+ "eval_steps_per_second": 2.858,
377
+ "learning_rate": 0.0001,
378
+ "step": 5083
379
+ },
380
+ {
381
+ "epoch": 24.0,
382
+ "eval_explained_variance": 0.6552526354789734,
383
+ "eval_kl_divergence": 0.055539198219776154,
384
+ "eval_loss": 0.4552234709262848,
385
+ "eval_mae": 0.06417837738990784,
386
+ "eval_rmse": 0.08780523389577866,
387
+ "eval_runtime": 27.2231,
388
+ "eval_samples_per_second": 172.574,
389
+ "eval_steps_per_second": 2.718,
390
+ "learning_rate": 0.0001,
391
+ "step": 5304
392
+ },
393
+ {
394
+ "epoch": 24.8868778280543,
395
+ "grad_norm": 0.0545237734913826,
396
+ "learning_rate": 0.0001,
397
+ "loss": 0.4736,
398
+ "step": 5500
399
+ },
400
+ {
401
+ "epoch": 25.0,
402
+ "eval_explained_variance": 0.6582456231117249,
403
+ "eval_kl_divergence": 0.023763582110404968,
404
+ "eval_loss": 0.45521080493927,
405
+ "eval_mae": 0.06447087973356247,
406
+ "eval_rmse": 0.08778873831033707,
407
+ "eval_runtime": 25.7982,
408
+ "eval_samples_per_second": 182.106,
409
+ "eval_steps_per_second": 2.868,
410
+ "learning_rate": 0.0001,
411
+ "step": 5525
412
+ },
413
+ {
414
+ "epoch": 26.0,
415
+ "eval_explained_variance": 0.6571853756904602,
416
+ "eval_kl_divergence": 0.040941931307315826,
417
+ "eval_loss": 0.4557025730609894,
418
+ "eval_mae": 0.06462270766496658,
419
+ "eval_rmse": 0.08846313506364822,
420
+ "eval_runtime": 25.5822,
421
+ "eval_samples_per_second": 183.643,
422
+ "eval_steps_per_second": 2.893,
423
+ "learning_rate": 0.0001,
424
+ "step": 5746
425
+ },
426
+ {
427
+ "epoch": 27.0,
428
+ "eval_explained_variance": 0.6576172709465027,
429
+ "eval_kl_divergence": 0.05476689711213112,
430
+ "eval_loss": 0.4550967216491699,
431
+ "eval_mae": 0.06391049176454544,
432
+ "eval_rmse": 0.08758416771888733,
433
+ "eval_runtime": 26.0908,
434
+ "eval_samples_per_second": 180.064,
435
+ "eval_steps_per_second": 2.836,
436
+ "learning_rate": 0.0001,
437
+ "step": 5967
438
+ },
439
+ {
440
+ "epoch": 27.149321266968325,
441
+ "grad_norm": 0.05160004645586014,
442
+ "learning_rate": 0.0001,
443
+ "loss": 0.4731,
444
+ "step": 6000
445
+ },
446
+ {
447
+ "epoch": 28.0,
448
+ "eval_explained_variance": 0.658767580986023,
449
+ "eval_kl_divergence": 0.027325255796313286,
450
+ "eval_loss": 0.45512688159942627,
451
+ "eval_mae": 0.0641704872250557,
452
+ "eval_rmse": 0.08764084428548813,
453
+ "eval_runtime": 25.6818,
454
+ "eval_samples_per_second": 182.931,
455
+ "eval_steps_per_second": 2.881,
456
+ "learning_rate": 0.0001,
457
+ "step": 6188
458
+ },
459
+ {
460
+ "epoch": 29.0,
461
+ "eval_explained_variance": 0.6617770195007324,
462
+ "eval_kl_divergence": 0.0744185745716095,
463
+ "eval_loss": 0.45477041602134705,
464
+ "eval_mae": 0.0634256973862648,
465
+ "eval_rmse": 0.08693012595176697,
466
+ "eval_runtime": 25.726,
467
+ "eval_samples_per_second": 182.617,
468
+ "eval_steps_per_second": 2.876,
469
+ "learning_rate": 0.0001,
470
+ "step": 6409
471
+ },
472
+ {
473
+ "epoch": 29.41176470588235,
474
+ "grad_norm": 0.07741276919841766,
475
+ "learning_rate": 0.0001,
476
+ "loss": 0.4727,
477
+ "step": 6500
478
+ },
479
+ {
480
+ "epoch": 30.0,
481
+ "eval_explained_variance": 0.6594749093055725,
482
+ "eval_kl_divergence": 0.049223385751247406,
483
+ "eval_loss": 0.4549327790737152,
484
+ "eval_mae": 0.06360659003257751,
485
+ "eval_rmse": 0.0873405933380127,
486
+ "eval_runtime": 25.4772,
487
+ "eval_samples_per_second": 184.4,
488
+ "eval_steps_per_second": 2.905,
489
+ "learning_rate": 0.0001,
490
+ "step": 6630
491
+ },
492
+ {
493
+ "epoch": 31.0,
494
+ "eval_explained_variance": 0.6613443493843079,
495
+ "eval_kl_divergence": 0.06878047436475754,
496
+ "eval_loss": 0.4547973871231079,
497
+ "eval_mae": 0.06322694569826126,
498
+ "eval_rmse": 0.08694975823163986,
499
+ "eval_runtime": 25.8257,
500
+ "eval_samples_per_second": 181.912,
501
+ "eval_steps_per_second": 2.865,
502
+ "learning_rate": 0.0001,
503
+ "step": 6851
504
+ },
505
+ {
506
+ "epoch": 31.67420814479638,
507
+ "grad_norm": 0.055884115397930145,
508
+ "learning_rate": 0.0001,
509
+ "loss": 0.4732,
510
+ "step": 7000
511
+ },
512
+ {
513
+ "epoch": 32.0,
514
+ "eval_explained_variance": 0.6602151393890381,
515
+ "eval_kl_divergence": 0.027085499837994576,
516
+ "eval_loss": 0.454988956451416,
517
+ "eval_mae": 0.063857302069664,
518
+ "eval_rmse": 0.08743549138307571,
519
+ "eval_runtime": 25.6292,
520
+ "eval_samples_per_second": 183.307,
521
+ "eval_steps_per_second": 2.887,
522
+ "learning_rate": 0.0001,
523
+ "step": 7072
524
+ },
525
+ {
526
+ "epoch": 33.0,
527
+ "eval_explained_variance": 0.6580324172973633,
528
+ "eval_kl_divergence": -0.017361771315336227,
529
+ "eval_loss": 0.455375999212265,
530
+ "eval_mae": 0.0646858736872673,
531
+ "eval_rmse": 0.08816961199045181,
532
+ "eval_runtime": 25.8246,
533
+ "eval_samples_per_second": 181.919,
534
+ "eval_steps_per_second": 2.865,
535
+ "learning_rate": 0.0001,
536
+ "step": 7293
537
+ },
538
+ {
539
+ "epoch": 33.93665158371041,
540
+ "grad_norm": 0.08047891408205032,
541
+ "learning_rate": 0.0001,
542
+ "loss": 0.4725,
543
+ "step": 7500
544
+ },
545
+ {
546
+ "epoch": 34.0,
547
+ "eval_explained_variance": 0.6616186499595642,
548
+ "eval_kl_divergence": 0.10939505696296692,
549
+ "eval_loss": 0.45461305975914,
550
+ "eval_mae": 0.0628495141863823,
551
+ "eval_rmse": 0.08664888888597488,
552
+ "eval_runtime": 25.7346,
553
+ "eval_samples_per_second": 182.556,
554
+ "eval_steps_per_second": 2.876,
555
+ "learning_rate": 0.0001,
556
+ "step": 7514
557
+ },
558
+ {
559
+ "epoch": 35.0,
560
+ "eval_explained_variance": 0.6582692265510559,
561
+ "eval_kl_divergence": 0.05707371234893799,
562
+ "eval_loss": 0.45498156547546387,
563
+ "eval_mae": 0.06386271119117737,
564
+ "eval_rmse": 0.08741921186447144,
565
+ "eval_runtime": 25.7857,
566
+ "eval_samples_per_second": 182.194,
567
+ "eval_steps_per_second": 2.87,
568
+ "learning_rate": 0.0001,
569
+ "step": 7735
570
+ },
571
+ {
572
+ "epoch": 36.0,
573
+ "eval_explained_variance": 0.6615896224975586,
574
+ "eval_kl_divergence": 0.14533284306526184,
575
+ "eval_loss": 0.4548388123512268,
576
+ "eval_mae": 0.0629100501537323,
577
+ "eval_rmse": 0.08686337620019913,
578
+ "eval_runtime": 29.7733,
579
+ "eval_samples_per_second": 157.793,
580
+ "eval_steps_per_second": 2.485,
581
+ "learning_rate": 0.0001,
582
+ "step": 7956
583
+ },
584
+ {
585
+ "epoch": 36.199095022624434,
586
+ "grad_norm": 0.07811417430639267,
587
+ "learning_rate": 0.0001,
588
+ "loss": 0.4727,
589
+ "step": 8000
590
+ },
591
+ {
592
+ "epoch": 37.0,
593
+ "eval_explained_variance": 0.6586756110191345,
594
+ "eval_kl_divergence": -0.015241213142871857,
595
+ "eval_loss": 0.45526784658432007,
596
+ "eval_mae": 0.06451455503702164,
597
+ "eval_rmse": 0.08806425333023071,
598
+ "eval_runtime": 25.6924,
599
+ "eval_samples_per_second": 182.855,
600
+ "eval_steps_per_second": 2.88,
601
+ "learning_rate": 0.0001,
602
+ "step": 8177
603
+ },
604
+ {
605
+ "epoch": 38.0,
606
+ "eval_explained_variance": 0.6612560153007507,
607
+ "eval_kl_divergence": 0.049000147730112076,
608
+ "eval_loss": 0.45479556918144226,
609
+ "eval_mae": 0.06361590325832367,
610
+ "eval_rmse": 0.08704841136932373,
611
+ "eval_runtime": 26.1103,
612
+ "eval_samples_per_second": 179.929,
613
+ "eval_steps_per_second": 2.834,
614
+ "learning_rate": 0.0001,
615
+ "step": 8398
616
+ },
617
+ {
618
+ "epoch": 38.46153846153846,
619
+ "grad_norm": 0.062047556042671204,
620
+ "learning_rate": 0.0001,
621
+ "loss": 0.4727,
622
+ "step": 8500
623
+ },
624
+ {
625
+ "epoch": 39.0,
626
+ "eval_explained_variance": 0.6610231995582581,
627
+ "eval_kl_divergence": 0.07255241274833679,
628
+ "eval_loss": 0.454780250787735,
629
+ "eval_mae": 0.06311424821615219,
630
+ "eval_rmse": 0.08698847889900208,
631
+ "eval_runtime": 25.6403,
632
+ "eval_samples_per_second": 183.227,
633
+ "eval_steps_per_second": 2.886,
634
+ "learning_rate": 0.0001,
635
+ "step": 8619
636
+ },
637
+ {
638
+ "epoch": 40.0,
639
+ "eval_explained_variance": 0.6605435013771057,
640
+ "eval_kl_divergence": 0.06372024863958359,
641
+ "eval_loss": 0.45476558804512024,
642
+ "eval_mae": 0.06323693692684174,
643
+ "eval_rmse": 0.08702895045280457,
644
+ "eval_runtime": 26.038,
645
+ "eval_samples_per_second": 180.429,
646
+ "eval_steps_per_second": 2.842,
647
+ "learning_rate": 0.0001,
648
+ "step": 8840
649
+ },
650
+ {
651
+ "epoch": 40.723981900452486,
652
+ "grad_norm": 0.08612842857837677,
653
+ "learning_rate": 1e-05,
654
+ "loss": 0.4721,
655
+ "step": 9000
656
+ },
657
+ {
658
+ "epoch": 41.0,
659
+ "eval_explained_variance": 0.6628013253211975,
660
+ "eval_kl_divergence": 0.039023660123348236,
661
+ "eval_loss": 0.45470812916755676,
662
+ "eval_mae": 0.0634213536977768,
663
+ "eval_rmse": 0.08692529052495956,
664
+ "eval_runtime": 25.9883,
665
+ "eval_samples_per_second": 180.774,
666
+ "eval_steps_per_second": 2.847,
667
+ "learning_rate": 1e-05,
668
+ "step": 9061
669
+ },
670
+ {
671
+ "epoch": 42.0,
672
+ "eval_explained_variance": 0.6656690239906311,
673
+ "eval_kl_divergence": 0.11149828135967255,
674
+ "eval_loss": 0.4543863534927368,
675
+ "eval_mae": 0.06281669437885284,
676
+ "eval_rmse": 0.08619723469018936,
677
+ "eval_runtime": 26.3115,
678
+ "eval_samples_per_second": 178.553,
679
+ "eval_steps_per_second": 2.812,
680
+ "learning_rate": 1e-05,
681
+ "step": 9282
682
+ },
683
+ {
684
+ "epoch": 42.98642533936652,
685
+ "grad_norm": 0.06828662008047104,
686
+ "learning_rate": 1e-05,
687
+ "loss": 0.4721,
688
+ "step": 9500
689
+ },
690
+ {
691
+ "epoch": 43.0,
692
+ "eval_explained_variance": 0.6645870804786682,
693
+ "eval_kl_divergence": 0.05330301821231842,
694
+ "eval_loss": 0.4545557498931885,
695
+ "eval_mae": 0.06320130825042725,
696
+ "eval_rmse": 0.0865868553519249,
697
+ "eval_runtime": 25.8985,
698
+ "eval_samples_per_second": 181.4,
699
+ "eval_steps_per_second": 2.857,
700
+ "learning_rate": 1e-05,
701
+ "step": 9503
702
+ },
703
+ {
704
+ "epoch": 44.0,
705
+ "eval_explained_variance": 0.6648023128509521,
706
+ "eval_kl_divergence": 0.13496889173984528,
707
+ "eval_loss": 0.45448434352874756,
708
+ "eval_mae": 0.06253467500209808,
709
+ "eval_rmse": 0.08635282516479492,
710
+ "eval_runtime": 26.0508,
711
+ "eval_samples_per_second": 180.34,
712
+ "eval_steps_per_second": 2.841,
713
+ "learning_rate": 1e-05,
714
+ "step": 9724
715
+ },
716
+ {
717
+ "epoch": 45.0,
718
+ "eval_explained_variance": 0.6624875068664551,
719
+ "eval_kl_divergence": 0.004431928042322397,
720
+ "eval_loss": 0.4550137519836426,
721
+ "eval_mae": 0.06418145447969437,
722
+ "eval_rmse": 0.0874209776520729,
723
+ "eval_runtime": 25.8495,
724
+ "eval_samples_per_second": 181.744,
725
+ "eval_steps_per_second": 2.863,
726
+ "learning_rate": 1e-05,
727
+ "step": 9945
728
+ },
729
+ {
730
+ "epoch": 45.248868778280546,
731
+ "grad_norm": 0.07514863461256027,
732
+ "learning_rate": 1e-05,
733
+ "loss": 0.4716,
734
+ "step": 10000
735
+ },
736
+ {
737
+ "epoch": 46.0,
738
+ "eval_explained_variance": 0.6642169952392578,
739
+ "eval_kl_divergence": 0.03887256979942322,
740
+ "eval_loss": 0.4545902609825134,
741
+ "eval_mae": 0.06316760927438736,
742
+ "eval_rmse": 0.08669499307870865,
743
+ "eval_runtime": 25.9222,
744
+ "eval_samples_per_second": 181.235,
745
+ "eval_steps_per_second": 2.855,
746
+ "learning_rate": 1e-05,
747
+ "step": 10166
748
+ },
749
+ {
750
+ "epoch": 47.0,
751
+ "eval_explained_variance": 0.6651113629341125,
752
+ "eval_kl_divergence": 0.037030890583992004,
753
+ "eval_loss": 0.4544997215270996,
754
+ "eval_mae": 0.06298934668302536,
755
+ "eval_rmse": 0.0865601971745491,
756
+ "eval_runtime": 25.9565,
757
+ "eval_samples_per_second": 180.995,
758
+ "eval_steps_per_second": 2.851,
759
+ "learning_rate": 1e-05,
760
+ "step": 10387
761
+ },
762
+ {
763
+ "epoch": 47.51131221719457,
764
+ "grad_norm": 0.057216282933950424,
765
+ "learning_rate": 1e-05,
766
+ "loss": 0.4722,
767
+ "step": 10500
768
+ },
769
+ {
770
+ "epoch": 48.0,
771
+ "eval_explained_variance": 0.6645199060440063,
772
+ "eval_kl_divergence": 0.019425788894295692,
773
+ "eval_loss": 0.4546374976634979,
774
+ "eval_mae": 0.06339576095342636,
775
+ "eval_rmse": 0.08680880069732666,
776
+ "eval_runtime": 25.7117,
777
+ "eval_samples_per_second": 182.718,
778
+ "eval_steps_per_second": 2.878,
779
+ "learning_rate": 1e-05,
780
+ "step": 10608
781
+ },
782
+ {
783
+ "epoch": 49.0,
784
+ "eval_explained_variance": 0.6666774153709412,
785
+ "eval_kl_divergence": 0.0667150691151619,
786
+ "eval_loss": 0.45436596870422363,
787
+ "eval_mae": 0.06269881874322891,
788
+ "eval_rmse": 0.08620164543390274,
789
+ "eval_runtime": 27.6905,
790
+ "eval_samples_per_second": 169.661,
791
+ "eval_steps_per_second": 2.672,
792
+ "learning_rate": 1.0000000000000002e-06,
793
+ "step": 10829
794
+ },
795
+ {
796
+ "epoch": 49.7737556561086,
797
+ "grad_norm": 0.07466714084148407,
798
+ "learning_rate": 1.0000000000000002e-06,
799
+ "loss": 0.4717,
800
+ "step": 11000
801
+ },
802
+ {
803
+ "epoch": 50.0,
804
+ "eval_explained_variance": 0.6650940179824829,
805
+ "eval_kl_divergence": 0.05483337119221687,
806
+ "eval_loss": 0.45450592041015625,
807
+ "eval_mae": 0.06310971826314926,
808
+ "eval_rmse": 0.08650273084640503,
809
+ "eval_runtime": 27.7128,
810
+ "eval_samples_per_second": 169.524,
811
+ "eval_steps_per_second": 2.67,
812
+ "learning_rate": 1.0000000000000002e-06,
813
+ "step": 11050
814
+ },
815
+ {
816
+ "epoch": 51.0,
817
+ "eval_explained_variance": 0.6651105284690857,
818
+ "eval_kl_divergence": 0.04277108237147331,
819
+ "eval_loss": 0.4544804096221924,
820
+ "eval_mae": 0.06292647123336792,
821
+ "eval_rmse": 0.08647629618644714,
822
+ "eval_runtime": 26.6553,
823
+ "eval_samples_per_second": 176.25,
824
+ "eval_steps_per_second": 2.776,
825
+ "learning_rate": 1.0000000000000002e-06,
826
+ "step": 11271
827
+ },
828
+ {
829
+ "epoch": 52.0,
830
+ "eval_explained_variance": 0.667234480381012,
831
+ "eval_kl_divergence": 0.12364839017391205,
832
+ "eval_loss": 0.45421910285949707,
833
+ "eval_mae": 0.06233237311244011,
834
+ "eval_rmse": 0.08589440584182739,
835
+ "eval_runtime": 25.8544,
836
+ "eval_samples_per_second": 181.71,
837
+ "eval_steps_per_second": 2.862,
838
+ "learning_rate": 1.0000000000000002e-06,
839
+ "step": 11492
840
+ },
841
+ {
842
+ "epoch": 52.036199095022624,
843
+ "grad_norm": 0.08442794531583786,
844
+ "learning_rate": 1.0000000000000002e-06,
845
+ "loss": 0.4718,
846
+ "step": 11500
847
+ },
848
+ {
849
+ "epoch": 53.0,
850
+ "eval_explained_variance": 0.6671742796897888,
851
+ "eval_kl_divergence": 0.08869530260562897,
852
+ "eval_loss": 0.4542272686958313,
853
+ "eval_mae": 0.06253313273191452,
854
+ "eval_rmse": 0.08594661206007004,
855
+ "eval_runtime": 25.9744,
856
+ "eval_samples_per_second": 180.871,
857
+ "eval_steps_per_second": 2.849,
858
+ "learning_rate": 1.0000000000000002e-06,
859
+ "step": 11713
860
+ },
861
+ {
862
+ "epoch": 54.0,
863
+ "eval_explained_variance": 0.6653165221214294,
864
+ "eval_kl_divergence": 0.09171402454376221,
865
+ "eval_loss": 0.4543103575706482,
866
+ "eval_mae": 0.0623968206346035,
867
+ "eval_rmse": 0.08615261316299438,
868
+ "eval_runtime": 26.0699,
869
+ "eval_samples_per_second": 180.208,
870
+ "eval_steps_per_second": 2.839,
871
+ "learning_rate": 1.0000000000000002e-06,
872
+ "step": 11934
873
+ },
874
+ {
875
+ "epoch": 54.29864253393665,
876
+ "grad_norm": 0.08775485306978226,
877
+ "learning_rate": 1.0000000000000002e-06,
878
+ "loss": 0.4716,
879
+ "step": 12000
880
+ },
881
+ {
882
+ "epoch": 55.0,
883
+ "eval_explained_variance": 0.6649713516235352,
884
+ "eval_kl_divergence": 0.07737051695585251,
885
+ "eval_loss": 0.45456644892692566,
886
+ "eval_mae": 0.06305743753910065,
887
+ "eval_rmse": 0.0865490511059761,
888
+ "eval_runtime": 26.0104,
889
+ "eval_samples_per_second": 180.62,
890
+ "eval_steps_per_second": 2.845,
891
+ "learning_rate": 1.0000000000000002e-06,
892
+ "step": 12155
893
+ },
894
+ {
895
+ "epoch": 56.0,
896
+ "eval_explained_variance": 0.6649186611175537,
897
+ "eval_kl_divergence": 0.04731013998389244,
898
+ "eval_loss": 0.45458319783210754,
899
+ "eval_mae": 0.06328658014535904,
900
+ "eval_rmse": 0.08663744479417801,
901
+ "eval_runtime": 25.8104,
902
+ "eval_samples_per_second": 182.019,
903
+ "eval_steps_per_second": 2.867,
904
+ "learning_rate": 1.0000000000000002e-06,
905
+ "step": 12376
906
+ },
907
+ {
908
+ "epoch": 56.56108597285068,
909
+ "grad_norm": 0.0692247599363327,
910
+ "learning_rate": 1.0000000000000002e-06,
911
+ "loss": 0.4717,
912
+ "step": 12500
913
+ },
914
+ {
915
+ "epoch": 57.0,
916
+ "eval_explained_variance": 0.6657507419586182,
917
+ "eval_kl_divergence": -0.004581684246659279,
918
+ "eval_loss": 0.4548773169517517,
919
+ "eval_mae": 0.0639243796467781,
920
+ "eval_rmse": 0.0871059000492096,
921
+ "eval_runtime": 25.4962,
922
+ "eval_samples_per_second": 184.262,
923
+ "eval_steps_per_second": 2.902,
924
+ "learning_rate": 1.0000000000000002e-06,
925
+ "step": 12597
926
+ },
927
+ {
928
+ "epoch": 58.0,
929
+ "eval_explained_variance": 0.6655800342559814,
930
+ "eval_kl_divergence": 0.0553017221391201,
931
+ "eval_loss": 0.45440155267715454,
932
+ "eval_mae": 0.06271661818027496,
933
+ "eval_rmse": 0.08635643124580383,
934
+ "eval_runtime": 26.1057,
935
+ "eval_samples_per_second": 179.961,
936
+ "eval_steps_per_second": 2.835,
937
+ "learning_rate": 1.0000000000000002e-06,
938
+ "step": 12818
939
+ },
940
+ {
941
+ "epoch": 58.8235294117647,
942
+ "grad_norm": 0.07922232896089554,
943
+ "learning_rate": 1.0000000000000002e-07,
944
+ "loss": 0.4716,
945
+ "step": 13000
946
+ },
947
+ {
948
+ "epoch": 59.0,
949
+ "eval_explained_variance": 0.6654148101806641,
950
+ "eval_kl_divergence": 0.03675610199570656,
951
+ "eval_loss": 0.45448538661003113,
952
+ "eval_mae": 0.06308572739362717,
953
+ "eval_rmse": 0.08650225400924683,
954
+ "eval_runtime": 25.8122,
955
+ "eval_samples_per_second": 182.007,
956
+ "eval_steps_per_second": 2.867,
957
+ "learning_rate": 1.0000000000000002e-07,
958
+ "step": 13039
959
+ },
960
+ {
961
+ "epoch": 60.0,
962
+ "eval_explained_variance": 0.6660366058349609,
963
+ "eval_kl_divergence": 0.047148581594228745,
964
+ "eval_loss": 0.4544091522693634,
965
+ "eval_mae": 0.06294982880353928,
966
+ "eval_rmse": 0.08633282780647278,
967
+ "eval_runtime": 26.4937,
968
+ "eval_samples_per_second": 177.325,
969
+ "eval_steps_per_second": 2.793,
970
+ "learning_rate": 1.0000000000000002e-07,
971
+ "step": 13260
972
+ },
973
+ {
974
+ "epoch": 61.0,
975
+ "eval_explained_variance": 0.6669723987579346,
976
+ "eval_kl_divergence": 0.09280110895633698,
977
+ "eval_loss": 0.4542348086833954,
978
+ "eval_mae": 0.062441930174827576,
979
+ "eval_rmse": 0.08595842123031616,
980
+ "eval_runtime": 26.0483,
981
+ "eval_samples_per_second": 180.357,
982
+ "eval_steps_per_second": 2.841,
983
+ "learning_rate": 1.0000000000000002e-07,
984
+ "step": 13481
985
+ },
986
+ {
987
+ "epoch": 61.085972850678736,
988
+ "grad_norm": 0.07845129072666168,
989
+ "learning_rate": 1.0000000000000002e-07,
990
+ "loss": 0.4718,
991
+ "step": 13500
992
+ },
993
+ {
994
+ "epoch": 62.0,
995
+ "eval_explained_variance": 0.6661055088043213,
996
+ "eval_kl_divergence": 0.028626998886466026,
997
+ "eval_loss": 0.4545469284057617,
998
+ "eval_mae": 0.06315190345048904,
999
+ "eval_rmse": 0.0865735188126564,
1000
+ "eval_runtime": 25.8503,
1001
+ "eval_samples_per_second": 181.739,
1002
+ "eval_steps_per_second": 2.863,
1003
+ "learning_rate": 1.0000000000000002e-07,
1004
+ "step": 13702
1005
+ },
1006
+ {
1007
+ "epoch": 62.0,
1008
+ "learning_rate": 1.0000000000000002e-07,
1009
+ "step": 13702,
1010
+ "total_flos": 9.42369297866869e+19,
1011
+ "train_loss": 0.4754439868851833,
1012
+ "train_runtime": 8961.4221,
1013
+ "train_samples_per_second": 235.894,
1014
+ "train_steps_per_second": 3.699
1015
+ }
1016
+ ],
1017
+ "logging_steps": 500,
1018
+ "max_steps": 33150,
1019
+ "num_input_tokens_seen": 0,
1020
+ "num_train_epochs": 150,
1021
+ "save_steps": 500,
1022
+ "stateful_callbacks": {
1023
+ "EarlyStoppingCallback": {
1024
+ "args": {
1025
+ "early_stopping_patience": 10,
1026
+ "early_stopping_threshold": 0.0
1027
+ },
1028
+ "attributes": {
1029
+ "early_stopping_patience_counter": 0
1030
+ }
1031
+ },
1032
+ "TrainerControl": {
1033
+ "args": {
1034
+ "should_epoch_stop": false,
1035
+ "should_evaluate": false,
1036
+ "should_log": false,
1037
+ "should_save": true,
1038
+ "should_training_stop": true
1039
+ },
1040
+ "attributes": {}
1041
+ }
1042
+ },
1043
+ "total_flos": 9.42369297866869e+19,
1044
+ "train_batch_size": 64,
1045
+ "trial_name": null,
1046
+ "trial_params": null
1047
+ }