File size: 24,096 Bytes
1c85a06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
2023-10-17 15:20:20,697 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,699 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): ElectraModel(
      (embeddings): ElectraEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): ElectraEncoder(
        (layer): ModuleList(
          (0-11): 12 x ElectraLayer(
            (attention): ElectraAttention(
              (self): ElectraSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): ElectraSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): ElectraIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): ElectraOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=21, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-17 15:20:20,699 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,699 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
 - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-17 15:20:20,699 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,699 Train:  3575 sentences
2023-10-17 15:20:20,699         (train_with_dev=False, train_with_test=False)
2023-10-17 15:20:20,699 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,699 Training Params:
2023-10-17 15:20:20,700  - learning_rate: "5e-05" 
2023-10-17 15:20:20,700  - mini_batch_size: "8"
2023-10-17 15:20:20,700  - max_epochs: "10"
2023-10-17 15:20:20,700  - shuffle: "True"
2023-10-17 15:20:20,700 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,700 Plugins:
2023-10-17 15:20:20,700  - TensorboardLogger
2023-10-17 15:20:20,700  - LinearScheduler | warmup_fraction: '0.1'
2023-10-17 15:20:20,700 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,700 Final evaluation on model from best epoch (best-model.pt)
2023-10-17 15:20:20,700  - metric: "('micro avg', 'f1-score')"
2023-10-17 15:20:20,700 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,700 Computation:
2023-10-17 15:20:20,700  - compute on device: cuda:0
2023-10-17 15:20:20,700  - embedding storage: none
2023-10-17 15:20:20,701 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,701 Model training base path: "hmbench-hipe2020/de-hmteams/teams-base-historic-multilingual-discriminator-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-17 15:20:20,701 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,701 ----------------------------------------------------------------------------------------------------
2023-10-17 15:20:20,701 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-17 15:20:25,296 epoch 1 - iter 44/447 - loss 3.29322406 - time (sec): 4.59 - samples/sec: 1935.59 - lr: 0.000005 - momentum: 0.000000
2023-10-17 15:20:29,526 epoch 1 - iter 88/447 - loss 2.20404796 - time (sec): 8.82 - samples/sec: 1932.03 - lr: 0.000010 - momentum: 0.000000
2023-10-17 15:20:33,812 epoch 1 - iter 132/447 - loss 1.66693701 - time (sec): 13.11 - samples/sec: 1943.12 - lr: 0.000015 - momentum: 0.000000
2023-10-17 15:20:37,841 epoch 1 - iter 176/447 - loss 1.34300581 - time (sec): 17.14 - samples/sec: 2013.07 - lr: 0.000020 - momentum: 0.000000
2023-10-17 15:20:42,001 epoch 1 - iter 220/447 - loss 1.13494006 - time (sec): 21.30 - samples/sec: 2034.17 - lr: 0.000024 - momentum: 0.000000
2023-10-17 15:20:46,118 epoch 1 - iter 264/447 - loss 0.99498347 - time (sec): 25.42 - samples/sec: 2037.60 - lr: 0.000029 - momentum: 0.000000
2023-10-17 15:20:50,011 epoch 1 - iter 308/447 - loss 0.90014844 - time (sec): 29.31 - samples/sec: 2045.24 - lr: 0.000034 - momentum: 0.000000
2023-10-17 15:20:54,208 epoch 1 - iter 352/447 - loss 0.82692832 - time (sec): 33.51 - samples/sec: 2030.01 - lr: 0.000039 - momentum: 0.000000
2023-10-17 15:20:58,668 epoch 1 - iter 396/447 - loss 0.75333912 - time (sec): 37.97 - samples/sec: 2037.44 - lr: 0.000044 - momentum: 0.000000
2023-10-17 15:21:02,777 epoch 1 - iter 440/447 - loss 0.70068186 - time (sec): 42.07 - samples/sec: 2029.61 - lr: 0.000049 - momentum: 0.000000
2023-10-17 15:21:03,549 ----------------------------------------------------------------------------------------------------
2023-10-17 15:21:03,549 EPOCH 1 done: loss 0.6939 - lr: 0.000049
2023-10-17 15:21:10,111 DEV : loss 0.1627715528011322 - f1-score (micro avg)  0.6637
2023-10-17 15:21:10,167 saving best model
2023-10-17 15:21:10,755 ----------------------------------------------------------------------------------------------------
2023-10-17 15:21:14,984 epoch 2 - iter 44/447 - loss 0.18574466 - time (sec): 4.23 - samples/sec: 2020.81 - lr: 0.000049 - momentum: 0.000000
2023-10-17 15:21:18,869 epoch 2 - iter 88/447 - loss 0.17587291 - time (sec): 8.11 - samples/sec: 2043.93 - lr: 0.000049 - momentum: 0.000000
2023-10-17 15:21:22,901 epoch 2 - iter 132/447 - loss 0.18074319 - time (sec): 12.14 - samples/sec: 2045.51 - lr: 0.000048 - momentum: 0.000000
2023-10-17 15:21:27,252 epoch 2 - iter 176/447 - loss 0.16716021 - time (sec): 16.49 - samples/sec: 2035.17 - lr: 0.000048 - momentum: 0.000000
2023-10-17 15:21:31,318 epoch 2 - iter 220/447 - loss 0.16088597 - time (sec): 20.56 - samples/sec: 2037.01 - lr: 0.000047 - momentum: 0.000000
2023-10-17 15:21:35,411 epoch 2 - iter 264/447 - loss 0.15627115 - time (sec): 24.65 - samples/sec: 2064.17 - lr: 0.000047 - momentum: 0.000000
2023-10-17 15:21:39,518 epoch 2 - iter 308/447 - loss 0.15212403 - time (sec): 28.76 - samples/sec: 2071.36 - lr: 0.000046 - momentum: 0.000000
2023-10-17 15:21:43,871 epoch 2 - iter 352/447 - loss 0.14647911 - time (sec): 33.11 - samples/sec: 2068.26 - lr: 0.000046 - momentum: 0.000000
2023-10-17 15:21:47,774 epoch 2 - iter 396/447 - loss 0.14381048 - time (sec): 37.02 - samples/sec: 2063.23 - lr: 0.000045 - momentum: 0.000000
2023-10-17 15:21:52,104 epoch 2 - iter 440/447 - loss 0.13930114 - time (sec): 41.35 - samples/sec: 2061.40 - lr: 0.000045 - momentum: 0.000000
2023-10-17 15:21:52,733 ----------------------------------------------------------------------------------------------------
2023-10-17 15:21:52,733 EPOCH 2 done: loss 0.1388 - lr: 0.000045
2023-10-17 15:22:03,664 DEV : loss 0.13203084468841553 - f1-score (micro avg)  0.7453
2023-10-17 15:22:03,719 saving best model
2023-10-17 15:22:04,360 ----------------------------------------------------------------------------------------------------
2023-10-17 15:22:08,773 epoch 3 - iter 44/447 - loss 0.07462311 - time (sec): 4.41 - samples/sec: 2144.45 - lr: 0.000044 - momentum: 0.000000
2023-10-17 15:22:12,960 epoch 3 - iter 88/447 - loss 0.07763574 - time (sec): 8.60 - samples/sec: 2161.08 - lr: 0.000043 - momentum: 0.000000
2023-10-17 15:22:16,914 epoch 3 - iter 132/447 - loss 0.07588462 - time (sec): 12.55 - samples/sec: 2119.86 - lr: 0.000043 - momentum: 0.000000
2023-10-17 15:22:20,965 epoch 3 - iter 176/447 - loss 0.07957562 - time (sec): 16.60 - samples/sec: 2065.06 - lr: 0.000042 - momentum: 0.000000
2023-10-17 15:22:24,832 epoch 3 - iter 220/447 - loss 0.07641351 - time (sec): 20.47 - samples/sec: 2080.13 - lr: 0.000042 - momentum: 0.000000
2023-10-17 15:22:28,692 epoch 3 - iter 264/447 - loss 0.07516708 - time (sec): 24.33 - samples/sec: 2077.44 - lr: 0.000041 - momentum: 0.000000
2023-10-17 15:22:32,638 epoch 3 - iter 308/447 - loss 0.07530165 - time (sec): 28.28 - samples/sec: 2075.02 - lr: 0.000041 - momentum: 0.000000
2023-10-17 15:22:36,871 epoch 3 - iter 352/447 - loss 0.07665390 - time (sec): 32.51 - samples/sec: 2087.45 - lr: 0.000040 - momentum: 0.000000
2023-10-17 15:22:40,961 epoch 3 - iter 396/447 - loss 0.07644784 - time (sec): 36.60 - samples/sec: 2091.93 - lr: 0.000040 - momentum: 0.000000
2023-10-17 15:22:45,586 epoch 3 - iter 440/447 - loss 0.07596108 - time (sec): 41.22 - samples/sec: 2072.90 - lr: 0.000039 - momentum: 0.000000
2023-10-17 15:22:46,244 ----------------------------------------------------------------------------------------------------
2023-10-17 15:22:46,244 EPOCH 3 done: loss 0.0779 - lr: 0.000039
2023-10-17 15:22:57,932 DEV : loss 0.15669210255146027 - f1-score (micro avg)  0.7361
2023-10-17 15:22:57,990 ----------------------------------------------------------------------------------------------------
2023-10-17 15:23:01,950 epoch 4 - iter 44/447 - loss 0.11113508 - time (sec): 3.96 - samples/sec: 2079.82 - lr: 0.000038 - momentum: 0.000000
2023-10-17 15:23:06,160 epoch 4 - iter 88/447 - loss 0.07727697 - time (sec): 8.17 - samples/sec: 2009.83 - lr: 0.000038 - momentum: 0.000000
2023-10-17 15:23:10,270 epoch 4 - iter 132/447 - loss 0.07094192 - time (sec): 12.28 - samples/sec: 1949.87 - lr: 0.000037 - momentum: 0.000000
2023-10-17 15:23:14,604 epoch 4 - iter 176/447 - loss 0.06225710 - time (sec): 16.61 - samples/sec: 1963.87 - lr: 0.000037 - momentum: 0.000000
2023-10-17 15:23:19,493 epoch 4 - iter 220/447 - loss 0.06035341 - time (sec): 21.50 - samples/sec: 1996.90 - lr: 0.000036 - momentum: 0.000000
2023-10-17 15:23:23,666 epoch 4 - iter 264/447 - loss 0.05722586 - time (sec): 25.67 - samples/sec: 1995.76 - lr: 0.000036 - momentum: 0.000000
2023-10-17 15:23:27,927 epoch 4 - iter 308/447 - loss 0.05924836 - time (sec): 29.93 - samples/sec: 1999.94 - lr: 0.000035 - momentum: 0.000000
2023-10-17 15:23:32,357 epoch 4 - iter 352/447 - loss 0.05871739 - time (sec): 34.36 - samples/sec: 1997.60 - lr: 0.000035 - momentum: 0.000000
2023-10-17 15:23:36,757 epoch 4 - iter 396/447 - loss 0.05849539 - time (sec): 38.77 - samples/sec: 1987.97 - lr: 0.000034 - momentum: 0.000000
2023-10-17 15:23:40,809 epoch 4 - iter 440/447 - loss 0.05804840 - time (sec): 42.82 - samples/sec: 1989.71 - lr: 0.000033 - momentum: 0.000000
2023-10-17 15:23:41,474 ----------------------------------------------------------------------------------------------------
2023-10-17 15:23:41,474 EPOCH 4 done: loss 0.0579 - lr: 0.000033
2023-10-17 15:23:52,260 DEV : loss 0.15689311921596527 - f1-score (micro avg)  0.7574
2023-10-17 15:23:52,319 saving best model
2023-10-17 15:23:54,139 ----------------------------------------------------------------------------------------------------
2023-10-17 15:23:58,437 epoch 5 - iter 44/447 - loss 0.03896987 - time (sec): 4.29 - samples/sec: 2061.67 - lr: 0.000033 - momentum: 0.000000
2023-10-17 15:24:02,519 epoch 5 - iter 88/447 - loss 0.04068401 - time (sec): 8.38 - samples/sec: 2066.50 - lr: 0.000032 - momentum: 0.000000
2023-10-17 15:24:06,827 epoch 5 - iter 132/447 - loss 0.03466089 - time (sec): 12.68 - samples/sec: 2079.00 - lr: 0.000032 - momentum: 0.000000
2023-10-17 15:24:10,950 epoch 5 - iter 176/447 - loss 0.03234991 - time (sec): 16.81 - samples/sec: 2029.72 - lr: 0.000031 - momentum: 0.000000
2023-10-17 15:24:15,467 epoch 5 - iter 220/447 - loss 0.03486824 - time (sec): 21.32 - samples/sec: 2031.61 - lr: 0.000031 - momentum: 0.000000
2023-10-17 15:24:19,494 epoch 5 - iter 264/447 - loss 0.03514322 - time (sec): 25.35 - samples/sec: 2028.49 - lr: 0.000030 - momentum: 0.000000
2023-10-17 15:24:23,617 epoch 5 - iter 308/447 - loss 0.03414630 - time (sec): 29.47 - samples/sec: 2030.22 - lr: 0.000030 - momentum: 0.000000
2023-10-17 15:24:27,949 epoch 5 - iter 352/447 - loss 0.03379714 - time (sec): 33.81 - samples/sec: 2023.06 - lr: 0.000029 - momentum: 0.000000
2023-10-17 15:24:32,314 epoch 5 - iter 396/447 - loss 0.03258252 - time (sec): 38.17 - samples/sec: 2009.56 - lr: 0.000028 - momentum: 0.000000
2023-10-17 15:24:37,011 epoch 5 - iter 440/447 - loss 0.03419784 - time (sec): 42.87 - samples/sec: 1992.60 - lr: 0.000028 - momentum: 0.000000
2023-10-17 15:24:37,701 ----------------------------------------------------------------------------------------------------
2023-10-17 15:24:37,702 EPOCH 5 done: loss 0.0342 - lr: 0.000028
2023-10-17 15:24:48,404 DEV : loss 0.19426260888576508 - f1-score (micro avg)  0.7695
2023-10-17 15:24:48,467 saving best model
2023-10-17 15:24:49,151 ----------------------------------------------------------------------------------------------------
2023-10-17 15:24:53,549 epoch 6 - iter 44/447 - loss 0.01163168 - time (sec): 4.40 - samples/sec: 1887.17 - lr: 0.000027 - momentum: 0.000000
2023-10-17 15:24:58,088 epoch 6 - iter 88/447 - loss 0.01528518 - time (sec): 8.94 - samples/sec: 1878.66 - lr: 0.000027 - momentum: 0.000000
2023-10-17 15:25:02,258 epoch 6 - iter 132/447 - loss 0.01991735 - time (sec): 13.10 - samples/sec: 1941.94 - lr: 0.000026 - momentum: 0.000000
2023-10-17 15:25:06,592 epoch 6 - iter 176/447 - loss 0.02065023 - time (sec): 17.44 - samples/sec: 1960.45 - lr: 0.000026 - momentum: 0.000000
2023-10-17 15:25:10,877 epoch 6 - iter 220/447 - loss 0.01950593 - time (sec): 21.72 - samples/sec: 1980.94 - lr: 0.000025 - momentum: 0.000000
2023-10-17 15:25:14,822 epoch 6 - iter 264/447 - loss 0.01844720 - time (sec): 25.67 - samples/sec: 1977.88 - lr: 0.000025 - momentum: 0.000000
2023-10-17 15:25:19,281 epoch 6 - iter 308/447 - loss 0.01859815 - time (sec): 30.13 - samples/sec: 2007.45 - lr: 0.000024 - momentum: 0.000000
2023-10-17 15:25:23,292 epoch 6 - iter 352/447 - loss 0.01870907 - time (sec): 34.14 - samples/sec: 2019.09 - lr: 0.000023 - momentum: 0.000000
2023-10-17 15:25:27,828 epoch 6 - iter 396/447 - loss 0.01881535 - time (sec): 38.68 - samples/sec: 2000.02 - lr: 0.000023 - momentum: 0.000000
2023-10-17 15:25:31,845 epoch 6 - iter 440/447 - loss 0.01872424 - time (sec): 42.69 - samples/sec: 1997.24 - lr: 0.000022 - momentum: 0.000000
2023-10-17 15:25:32,465 ----------------------------------------------------------------------------------------------------
2023-10-17 15:25:32,466 EPOCH 6 done: loss 0.0187 - lr: 0.000022
2023-10-17 15:25:43,025 DEV : loss 0.19804143905639648 - f1-score (micro avg)  0.771
2023-10-17 15:25:43,090 saving best model
2023-10-17 15:25:44,546 ----------------------------------------------------------------------------------------------------
2023-10-17 15:25:49,064 epoch 7 - iter 44/447 - loss 0.01406927 - time (sec): 4.51 - samples/sec: 2113.97 - lr: 0.000022 - momentum: 0.000000
2023-10-17 15:25:53,177 epoch 7 - iter 88/447 - loss 0.01378757 - time (sec): 8.63 - samples/sec: 2067.06 - lr: 0.000021 - momentum: 0.000000
2023-10-17 15:25:57,437 epoch 7 - iter 132/447 - loss 0.01246763 - time (sec): 12.88 - samples/sec: 2063.50 - lr: 0.000021 - momentum: 0.000000
2023-10-17 15:26:01,968 epoch 7 - iter 176/447 - loss 0.01393283 - time (sec): 17.42 - samples/sec: 2068.23 - lr: 0.000020 - momentum: 0.000000
2023-10-17 15:26:05,992 epoch 7 - iter 220/447 - loss 0.01344518 - time (sec): 21.44 - samples/sec: 2076.84 - lr: 0.000020 - momentum: 0.000000
2023-10-17 15:26:10,031 epoch 7 - iter 264/447 - loss 0.01257828 - time (sec): 25.48 - samples/sec: 2061.24 - lr: 0.000019 - momentum: 0.000000
2023-10-17 15:26:14,132 epoch 7 - iter 308/447 - loss 0.01318801 - time (sec): 29.58 - samples/sec: 2051.47 - lr: 0.000018 - momentum: 0.000000
2023-10-17 15:26:18,294 epoch 7 - iter 352/447 - loss 0.01344369 - time (sec): 33.74 - samples/sec: 2051.06 - lr: 0.000018 - momentum: 0.000000
2023-10-17 15:26:22,814 epoch 7 - iter 396/447 - loss 0.01372082 - time (sec): 38.26 - samples/sec: 2025.99 - lr: 0.000017 - momentum: 0.000000
2023-10-17 15:26:27,255 epoch 7 - iter 440/447 - loss 0.01396394 - time (sec): 42.70 - samples/sec: 1996.18 - lr: 0.000017 - momentum: 0.000000
2023-10-17 15:26:27,962 ----------------------------------------------------------------------------------------------------
2023-10-17 15:26:27,963 EPOCH 7 done: loss 0.0139 - lr: 0.000017
2023-10-17 15:26:38,984 DEV : loss 0.22845597565174103 - f1-score (micro avg)  0.8003
2023-10-17 15:26:39,038 saving best model
2023-10-17 15:26:40,462 ----------------------------------------------------------------------------------------------------
2023-10-17 15:26:44,896 epoch 8 - iter 44/447 - loss 0.00921936 - time (sec): 4.43 - samples/sec: 1841.71 - lr: 0.000016 - momentum: 0.000000
2023-10-17 15:26:49,151 epoch 8 - iter 88/447 - loss 0.00912078 - time (sec): 8.68 - samples/sec: 1880.14 - lr: 0.000016 - momentum: 0.000000
2023-10-17 15:26:53,847 epoch 8 - iter 132/447 - loss 0.01106636 - time (sec): 13.38 - samples/sec: 1809.62 - lr: 0.000015 - momentum: 0.000000
2023-10-17 15:26:58,466 epoch 8 - iter 176/447 - loss 0.01077249 - time (sec): 18.00 - samples/sec: 1830.19 - lr: 0.000015 - momentum: 0.000000
2023-10-17 15:27:02,928 epoch 8 - iter 220/447 - loss 0.00985768 - time (sec): 22.46 - samples/sec: 1859.16 - lr: 0.000014 - momentum: 0.000000
2023-10-17 15:27:07,409 epoch 8 - iter 264/447 - loss 0.00933209 - time (sec): 26.94 - samples/sec: 1856.19 - lr: 0.000013 - momentum: 0.000000
2023-10-17 15:27:11,913 epoch 8 - iter 308/447 - loss 0.00948564 - time (sec): 31.45 - samples/sec: 1848.78 - lr: 0.000013 - momentum: 0.000000
2023-10-17 15:27:16,212 epoch 8 - iter 352/447 - loss 0.00988733 - time (sec): 35.75 - samples/sec: 1856.93 - lr: 0.000012 - momentum: 0.000000
2023-10-17 15:27:20,656 epoch 8 - iter 396/447 - loss 0.00993039 - time (sec): 40.19 - samples/sec: 1886.26 - lr: 0.000012 - momentum: 0.000000
2023-10-17 15:27:25,141 epoch 8 - iter 440/447 - loss 0.00960367 - time (sec): 44.67 - samples/sec: 1907.98 - lr: 0.000011 - momentum: 0.000000
2023-10-17 15:27:25,768 ----------------------------------------------------------------------------------------------------
2023-10-17 15:27:25,769 EPOCH 8 done: loss 0.0095 - lr: 0.000011
2023-10-17 15:27:37,592 DEV : loss 0.21943919360637665 - f1-score (micro avg)  0.7933
2023-10-17 15:27:37,660 ----------------------------------------------------------------------------------------------------
2023-10-17 15:27:41,957 epoch 9 - iter 44/447 - loss 0.00566894 - time (sec): 4.30 - samples/sec: 1769.19 - lr: 0.000011 - momentum: 0.000000
2023-10-17 15:27:45,995 epoch 9 - iter 88/447 - loss 0.00637583 - time (sec): 8.33 - samples/sec: 1919.87 - lr: 0.000010 - momentum: 0.000000
2023-10-17 15:27:50,104 epoch 9 - iter 132/447 - loss 0.00607137 - time (sec): 12.44 - samples/sec: 1954.16 - lr: 0.000010 - momentum: 0.000000
2023-10-17 15:27:54,416 epoch 9 - iter 176/447 - loss 0.00515606 - time (sec): 16.75 - samples/sec: 1948.74 - lr: 0.000009 - momentum: 0.000000
2023-10-17 15:28:00,031 epoch 9 - iter 220/447 - loss 0.00413243 - time (sec): 22.37 - samples/sec: 1922.67 - lr: 0.000008 - momentum: 0.000000
2023-10-17 15:28:04,550 epoch 9 - iter 264/447 - loss 0.00424171 - time (sec): 26.89 - samples/sec: 1929.66 - lr: 0.000008 - momentum: 0.000000
2023-10-17 15:28:09,004 epoch 9 - iter 308/447 - loss 0.00528104 - time (sec): 31.34 - samples/sec: 1914.50 - lr: 0.000007 - momentum: 0.000000
2023-10-17 15:28:13,713 epoch 9 - iter 352/447 - loss 0.00636826 - time (sec): 36.05 - samples/sec: 1888.42 - lr: 0.000007 - momentum: 0.000000
2023-10-17 15:28:18,158 epoch 9 - iter 396/447 - loss 0.00660453 - time (sec): 40.50 - samples/sec: 1892.15 - lr: 0.000006 - momentum: 0.000000
2023-10-17 15:28:22,745 epoch 9 - iter 440/447 - loss 0.00609731 - time (sec): 45.08 - samples/sec: 1883.24 - lr: 0.000006 - momentum: 0.000000
2023-10-17 15:28:23,448 ----------------------------------------------------------------------------------------------------
2023-10-17 15:28:23,449 EPOCH 9 done: loss 0.0060 - lr: 0.000006
2023-10-17 15:28:34,773 DEV : loss 0.24061691761016846 - f1-score (micro avg)  0.7905
2023-10-17 15:28:34,835 ----------------------------------------------------------------------------------------------------
2023-10-17 15:28:39,324 epoch 10 - iter 44/447 - loss 0.00465839 - time (sec): 4.49 - samples/sec: 1899.47 - lr: 0.000005 - momentum: 0.000000
2023-10-17 15:28:43,753 epoch 10 - iter 88/447 - loss 0.00384135 - time (sec): 8.92 - samples/sec: 1850.06 - lr: 0.000005 - momentum: 0.000000
2023-10-17 15:28:48,498 epoch 10 - iter 132/447 - loss 0.00456888 - time (sec): 13.66 - samples/sec: 1924.22 - lr: 0.000004 - momentum: 0.000000
2023-10-17 15:28:52,819 epoch 10 - iter 176/447 - loss 0.00488622 - time (sec): 17.98 - samples/sec: 1899.89 - lr: 0.000003 - momentum: 0.000000
2023-10-17 15:28:57,213 epoch 10 - iter 220/447 - loss 0.00398089 - time (sec): 22.38 - samples/sec: 1920.45 - lr: 0.000003 - momentum: 0.000000
2023-10-17 15:29:01,484 epoch 10 - iter 264/447 - loss 0.00334650 - time (sec): 26.65 - samples/sec: 1942.12 - lr: 0.000002 - momentum: 0.000000
2023-10-17 15:29:05,562 epoch 10 - iter 308/447 - loss 0.00323001 - time (sec): 30.72 - samples/sec: 1952.19 - lr: 0.000002 - momentum: 0.000000
2023-10-17 15:29:09,820 epoch 10 - iter 352/447 - loss 0.00363184 - time (sec): 34.98 - samples/sec: 1967.59 - lr: 0.000001 - momentum: 0.000000
2023-10-17 15:29:13,771 epoch 10 - iter 396/447 - loss 0.00344856 - time (sec): 38.93 - samples/sec: 1965.60 - lr: 0.000001 - momentum: 0.000000
2023-10-17 15:29:18,176 epoch 10 - iter 440/447 - loss 0.00328569 - time (sec): 43.34 - samples/sec: 1969.55 - lr: 0.000000 - momentum: 0.000000
2023-10-17 15:29:18,816 ----------------------------------------------------------------------------------------------------
2023-10-17 15:29:18,816 EPOCH 10 done: loss 0.0032 - lr: 0.000000
2023-10-17 15:29:29,893 DEV : loss 0.23245255649089813 - f1-score (micro avg)  0.8025
2023-10-17 15:29:29,948 saving best model
2023-10-17 15:29:32,004 ----------------------------------------------------------------------------------------------------
2023-10-17 15:29:32,006 Loading model from best epoch ...
2023-10-17 15:29:35,209 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-17 15:29:41,022 
Results:
- F-score (micro) 0.762
- F-score (macro) 0.6737
- Accuracy 0.6314

By class:
              precision    recall  f1-score   support

         loc     0.8550    0.8607    0.8579       596
        pers     0.6850    0.7838    0.7311       333
         org     0.5431    0.4773    0.5081       132
        prod     0.6071    0.5152    0.5574        66
        time     0.7143    0.7143    0.7143        49

   micro avg     0.7537    0.7704    0.7620      1176
   macro avg     0.6809    0.6702    0.6737      1176
weighted avg     0.7521    0.7704    0.7599      1176

2023-10-17 15:29:41,023 ----------------------------------------------------------------------------------------------------