File size: 24,231 Bytes
6442547
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
2023-10-17 11:07:44,037 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,040 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): ElectraModel(
      (embeddings): ElectraEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): ElectraEncoder(
        (layer): ModuleList(
          (0-11): 12 x ElectraLayer(
            (attention): ElectraAttention(
              (self): ElectraSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): ElectraSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): ElectraIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): ElectraOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-17 11:07:44,040 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,040 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
 - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-17 11:07:44,040 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,041 Train:  20847 sentences
2023-10-17 11:07:44,041         (train_with_dev=False, train_with_test=False)
2023-10-17 11:07:44,041 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,041 Training Params:
2023-10-17 11:07:44,041  - learning_rate: "3e-05" 
2023-10-17 11:07:44,041  - mini_batch_size: "4"
2023-10-17 11:07:44,041  - max_epochs: "10"
2023-10-17 11:07:44,041  - shuffle: "True"
2023-10-17 11:07:44,041 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,041 Plugins:
2023-10-17 11:07:44,041  - TensorboardLogger
2023-10-17 11:07:44,042  - LinearScheduler | warmup_fraction: '0.1'
2023-10-17 11:07:44,042 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,042 Final evaluation on model from best epoch (best-model.pt)
2023-10-17 11:07:44,042  - metric: "('micro avg', 'f1-score')"
2023-10-17 11:07:44,042 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,042 Computation:
2023-10-17 11:07:44,042  - compute on device: cuda:0
2023-10-17 11:07:44,042  - embedding storage: none
2023-10-17 11:07:44,042 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,042 Model training base path: "hmbench-newseye/de-hmteams/teams-base-historic-multilingual-discriminator-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-17 11:07:44,042 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,043 ----------------------------------------------------------------------------------------------------
2023-10-17 11:07:44,043 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-17 11:08:26,363 epoch 1 - iter 521/5212 - loss 1.89163776 - time (sec): 42.32 - samples/sec: 798.01 - lr: 0.000003 - momentum: 0.000000
2023-10-17 11:09:08,855 epoch 1 - iter 1042/5212 - loss 1.13479704 - time (sec): 84.81 - samples/sec: 817.76 - lr: 0.000006 - momentum: 0.000000
2023-10-17 11:09:54,477 epoch 1 - iter 1563/5212 - loss 0.84671118 - time (sec): 130.43 - samples/sec: 826.16 - lr: 0.000009 - momentum: 0.000000
2023-10-17 11:10:37,574 epoch 1 - iter 2084/5212 - loss 0.69980888 - time (sec): 173.53 - samples/sec: 838.22 - lr: 0.000012 - momentum: 0.000000
2023-10-17 11:11:21,695 epoch 1 - iter 2605/5212 - loss 0.60610850 - time (sec): 217.65 - samples/sec: 848.09 - lr: 0.000015 - momentum: 0.000000
2023-10-17 11:12:04,464 epoch 1 - iter 3126/5212 - loss 0.54043037 - time (sec): 260.42 - samples/sec: 857.81 - lr: 0.000018 - momentum: 0.000000
2023-10-17 11:12:47,850 epoch 1 - iter 3647/5212 - loss 0.49949347 - time (sec): 303.81 - samples/sec: 851.51 - lr: 0.000021 - momentum: 0.000000
2023-10-17 11:13:31,805 epoch 1 - iter 4168/5212 - loss 0.46823322 - time (sec): 347.76 - samples/sec: 842.57 - lr: 0.000024 - momentum: 0.000000
2023-10-17 11:14:16,895 epoch 1 - iter 4689/5212 - loss 0.43915469 - time (sec): 392.85 - samples/sec: 842.42 - lr: 0.000027 - momentum: 0.000000
2023-10-17 11:15:00,022 epoch 1 - iter 5210/5212 - loss 0.41555785 - time (sec): 435.98 - samples/sec: 842.27 - lr: 0.000030 - momentum: 0.000000
2023-10-17 11:15:00,190 ----------------------------------------------------------------------------------------------------
2023-10-17 11:15:00,190 EPOCH 1 done: loss 0.4153 - lr: 0.000030
2023-10-17 11:15:07,630 DEV : loss 0.11844930797815323 - f1-score (micro avg)  0.2469
2023-10-17 11:15:07,684 saving best model
2023-10-17 11:15:08,240 ----------------------------------------------------------------------------------------------------
2023-10-17 11:15:51,077 epoch 2 - iter 521/5212 - loss 0.18677525 - time (sec): 42.84 - samples/sec: 893.85 - lr: 0.000030 - momentum: 0.000000
2023-10-17 11:16:34,185 epoch 2 - iter 1042/5212 - loss 0.18467584 - time (sec): 85.94 - samples/sec: 867.54 - lr: 0.000029 - momentum: 0.000000
2023-10-17 11:17:17,438 epoch 2 - iter 1563/5212 - loss 0.18459225 - time (sec): 129.20 - samples/sec: 868.65 - lr: 0.000029 - momentum: 0.000000
2023-10-17 11:18:00,970 epoch 2 - iter 2084/5212 - loss 0.18949291 - time (sec): 172.73 - samples/sec: 854.13 - lr: 0.000029 - momentum: 0.000000
2023-10-17 11:18:45,977 epoch 2 - iter 2605/5212 - loss 0.18960565 - time (sec): 217.74 - samples/sec: 842.99 - lr: 0.000028 - momentum: 0.000000
2023-10-17 11:19:29,280 epoch 2 - iter 3126/5212 - loss 0.18834557 - time (sec): 261.04 - samples/sec: 840.64 - lr: 0.000028 - momentum: 0.000000
2023-10-17 11:20:11,948 epoch 2 - iter 3647/5212 - loss 0.18442261 - time (sec): 303.71 - samples/sec: 854.40 - lr: 0.000028 - momentum: 0.000000
2023-10-17 11:20:55,872 epoch 2 - iter 4168/5212 - loss 0.18307206 - time (sec): 347.63 - samples/sec: 851.42 - lr: 0.000027 - momentum: 0.000000
2023-10-17 11:21:38,580 epoch 2 - iter 4689/5212 - loss 0.17918821 - time (sec): 390.34 - samples/sec: 844.82 - lr: 0.000027 - momentum: 0.000000
2023-10-17 11:22:20,682 epoch 2 - iter 5210/5212 - loss 0.17650046 - time (sec): 432.44 - samples/sec: 849.48 - lr: 0.000027 - momentum: 0.000000
2023-10-17 11:22:20,832 ----------------------------------------------------------------------------------------------------
2023-10-17 11:22:20,833 EPOCH 2 done: loss 0.1765 - lr: 0.000027
2023-10-17 11:22:32,839 DEV : loss 0.23894941806793213 - f1-score (micro avg)  0.3469
2023-10-17 11:22:32,893 saving best model
2023-10-17 11:22:34,316 ----------------------------------------------------------------------------------------------------
2023-10-17 11:23:15,325 epoch 3 - iter 521/5212 - loss 0.11437386 - time (sec): 41.01 - samples/sec: 925.03 - lr: 0.000026 - momentum: 0.000000
2023-10-17 11:23:57,365 epoch 3 - iter 1042/5212 - loss 0.12459624 - time (sec): 83.04 - samples/sec: 902.84 - lr: 0.000026 - momentum: 0.000000
2023-10-17 11:24:39,204 epoch 3 - iter 1563/5212 - loss 0.12923542 - time (sec): 124.88 - samples/sec: 889.63 - lr: 0.000026 - momentum: 0.000000
2023-10-17 11:25:20,547 epoch 3 - iter 2084/5212 - loss 0.13146091 - time (sec): 166.23 - samples/sec: 882.27 - lr: 0.000025 - momentum: 0.000000
2023-10-17 11:26:01,412 epoch 3 - iter 2605/5212 - loss 0.12864210 - time (sec): 207.09 - samples/sec: 886.34 - lr: 0.000025 - momentum: 0.000000
2023-10-17 11:26:42,942 epoch 3 - iter 3126/5212 - loss 0.13399268 - time (sec): 248.62 - samples/sec: 875.92 - lr: 0.000025 - momentum: 0.000000
2023-10-17 11:27:25,033 epoch 3 - iter 3647/5212 - loss 0.13321294 - time (sec): 290.71 - samples/sec: 874.05 - lr: 0.000024 - momentum: 0.000000
2023-10-17 11:28:06,996 epoch 3 - iter 4168/5212 - loss 0.13283195 - time (sec): 332.68 - samples/sec: 874.06 - lr: 0.000024 - momentum: 0.000000
2023-10-17 11:28:50,011 epoch 3 - iter 4689/5212 - loss 0.13502035 - time (sec): 375.69 - samples/sec: 878.62 - lr: 0.000024 - momentum: 0.000000
2023-10-17 11:29:31,516 epoch 3 - iter 5210/5212 - loss 0.13202538 - time (sec): 417.20 - samples/sec: 880.09 - lr: 0.000023 - momentum: 0.000000
2023-10-17 11:29:31,671 ----------------------------------------------------------------------------------------------------
2023-10-17 11:29:31,671 EPOCH 3 done: loss 0.1319 - lr: 0.000023
2023-10-17 11:29:43,652 DEV : loss 0.24874247610569 - f1-score (micro avg)  0.351
2023-10-17 11:29:43,706 saving best model
2023-10-17 11:29:45,126 ----------------------------------------------------------------------------------------------------
2023-10-17 11:30:29,111 epoch 4 - iter 521/5212 - loss 0.09645922 - time (sec): 43.98 - samples/sec: 847.79 - lr: 0.000023 - momentum: 0.000000
2023-10-17 11:31:14,228 epoch 4 - iter 1042/5212 - loss 0.09575057 - time (sec): 89.10 - samples/sec: 827.29 - lr: 0.000023 - momentum: 0.000000
2023-10-17 11:31:56,651 epoch 4 - iter 1563/5212 - loss 0.09440810 - time (sec): 131.52 - samples/sec: 826.16 - lr: 0.000022 - momentum: 0.000000
2023-10-17 11:32:38,800 epoch 4 - iter 2084/5212 - loss 0.09200648 - time (sec): 173.67 - samples/sec: 830.24 - lr: 0.000022 - momentum: 0.000000
2023-10-17 11:33:22,254 epoch 4 - iter 2605/5212 - loss 0.09507978 - time (sec): 217.12 - samples/sec: 824.50 - lr: 0.000022 - momentum: 0.000000
2023-10-17 11:34:03,455 epoch 4 - iter 3126/5212 - loss 0.09574149 - time (sec): 258.33 - samples/sec: 827.19 - lr: 0.000021 - momentum: 0.000000
2023-10-17 11:34:45,413 epoch 4 - iter 3647/5212 - loss 0.09612564 - time (sec): 300.28 - samples/sec: 839.08 - lr: 0.000021 - momentum: 0.000000
2023-10-17 11:35:28,560 epoch 4 - iter 4168/5212 - loss 0.09673298 - time (sec): 343.43 - samples/sec: 846.54 - lr: 0.000021 - momentum: 0.000000
2023-10-17 11:36:11,639 epoch 4 - iter 4689/5212 - loss 0.09605759 - time (sec): 386.51 - samples/sec: 853.19 - lr: 0.000020 - momentum: 0.000000
2023-10-17 11:36:54,277 epoch 4 - iter 5210/5212 - loss 0.09435830 - time (sec): 429.15 - samples/sec: 855.98 - lr: 0.000020 - momentum: 0.000000
2023-10-17 11:36:54,441 ----------------------------------------------------------------------------------------------------
2023-10-17 11:36:54,441 EPOCH 4 done: loss 0.0943 - lr: 0.000020
2023-10-17 11:37:06,587 DEV : loss 0.2750011384487152 - f1-score (micro avg)  0.3813
2023-10-17 11:37:06,641 saving best model
2023-10-17 11:37:08,118 ----------------------------------------------------------------------------------------------------
2023-10-17 11:37:52,422 epoch 5 - iter 521/5212 - loss 0.05806410 - time (sec): 44.30 - samples/sec: 850.98 - lr: 0.000020 - momentum: 0.000000
2023-10-17 11:38:35,833 epoch 5 - iter 1042/5212 - loss 0.05768321 - time (sec): 87.71 - samples/sec: 816.42 - lr: 0.000019 - momentum: 0.000000
2023-10-17 11:39:21,545 epoch 5 - iter 1563/5212 - loss 0.06370432 - time (sec): 133.42 - samples/sec: 813.72 - lr: 0.000019 - momentum: 0.000000
2023-10-17 11:40:04,393 epoch 5 - iter 2084/5212 - loss 0.06198607 - time (sec): 176.27 - samples/sec: 810.85 - lr: 0.000019 - momentum: 0.000000
2023-10-17 11:40:50,143 epoch 5 - iter 2605/5212 - loss 0.06397044 - time (sec): 222.02 - samples/sec: 820.57 - lr: 0.000018 - momentum: 0.000000
2023-10-17 11:41:34,615 epoch 5 - iter 3126/5212 - loss 0.06343104 - time (sec): 266.49 - samples/sec: 834.78 - lr: 0.000018 - momentum: 0.000000
2023-10-17 11:42:17,862 epoch 5 - iter 3647/5212 - loss 0.06370732 - time (sec): 309.74 - samples/sec: 834.87 - lr: 0.000018 - momentum: 0.000000
2023-10-17 11:43:00,416 epoch 5 - iter 4168/5212 - loss 0.06361989 - time (sec): 352.29 - samples/sec: 841.76 - lr: 0.000017 - momentum: 0.000000
2023-10-17 11:43:41,207 epoch 5 - iter 4689/5212 - loss 0.06404810 - time (sec): 393.08 - samples/sec: 842.08 - lr: 0.000017 - momentum: 0.000000
2023-10-17 11:44:23,171 epoch 5 - iter 5210/5212 - loss 0.06343811 - time (sec): 435.05 - samples/sec: 844.47 - lr: 0.000017 - momentum: 0.000000
2023-10-17 11:44:23,319 ----------------------------------------------------------------------------------------------------
2023-10-17 11:44:23,320 EPOCH 5 done: loss 0.0635 - lr: 0.000017
2023-10-17 11:44:34,163 DEV : loss 0.34400203824043274 - f1-score (micro avg)  0.3937
2023-10-17 11:44:34,220 saving best model
2023-10-17 11:44:35,623 ----------------------------------------------------------------------------------------------------
2023-10-17 11:45:19,252 epoch 6 - iter 521/5212 - loss 0.05472897 - time (sec): 43.62 - samples/sec: 855.22 - lr: 0.000016 - momentum: 0.000000
2023-10-17 11:46:00,081 epoch 6 - iter 1042/5212 - loss 0.05370031 - time (sec): 84.45 - samples/sec: 855.73 - lr: 0.000016 - momentum: 0.000000
2023-10-17 11:46:41,972 epoch 6 - iter 1563/5212 - loss 0.04710872 - time (sec): 126.34 - samples/sec: 849.29 - lr: 0.000016 - momentum: 0.000000
2023-10-17 11:47:27,063 epoch 6 - iter 2084/5212 - loss 0.04824334 - time (sec): 171.44 - samples/sec: 855.69 - lr: 0.000015 - momentum: 0.000000
2023-10-17 11:48:09,332 epoch 6 - iter 2605/5212 - loss 0.04632066 - time (sec): 213.70 - samples/sec: 868.35 - lr: 0.000015 - momentum: 0.000000
2023-10-17 11:48:50,366 epoch 6 - iter 3126/5212 - loss 0.04608767 - time (sec): 254.74 - samples/sec: 874.91 - lr: 0.000015 - momentum: 0.000000
2023-10-17 11:49:32,284 epoch 6 - iter 3647/5212 - loss 0.04516211 - time (sec): 296.66 - samples/sec: 871.68 - lr: 0.000014 - momentum: 0.000000
2023-10-17 11:50:13,897 epoch 6 - iter 4168/5212 - loss 0.04805567 - time (sec): 338.27 - samples/sec: 869.50 - lr: 0.000014 - momentum: 0.000000
2023-10-17 11:50:56,045 epoch 6 - iter 4689/5212 - loss 0.04758346 - time (sec): 380.42 - samples/sec: 871.32 - lr: 0.000014 - momentum: 0.000000
2023-10-17 11:51:37,913 epoch 6 - iter 5210/5212 - loss 0.04826135 - time (sec): 422.29 - samples/sec: 869.98 - lr: 0.000013 - momentum: 0.000000
2023-10-17 11:51:38,067 ----------------------------------------------------------------------------------------------------
2023-10-17 11:51:38,068 EPOCH 6 done: loss 0.0483 - lr: 0.000013
2023-10-17 11:51:49,240 DEV : loss 0.2987309396266937 - f1-score (micro avg)  0.3914
2023-10-17 11:51:49,296 ----------------------------------------------------------------------------------------------------
2023-10-17 11:52:31,138 epoch 7 - iter 521/5212 - loss 0.03418427 - time (sec): 41.84 - samples/sec: 904.38 - lr: 0.000013 - momentum: 0.000000
2023-10-17 11:53:13,452 epoch 7 - iter 1042/5212 - loss 0.03052773 - time (sec): 84.15 - samples/sec: 895.56 - lr: 0.000013 - momentum: 0.000000
2023-10-17 11:53:57,642 epoch 7 - iter 1563/5212 - loss 0.03183996 - time (sec): 128.34 - samples/sec: 877.20 - lr: 0.000012 - momentum: 0.000000
2023-10-17 11:54:40,075 epoch 7 - iter 2084/5212 - loss 0.03089690 - time (sec): 170.78 - samples/sec: 876.39 - lr: 0.000012 - momentum: 0.000000
2023-10-17 11:55:22,339 epoch 7 - iter 2605/5212 - loss 0.03370678 - time (sec): 213.04 - samples/sec: 867.96 - lr: 0.000012 - momentum: 0.000000
2023-10-17 11:56:03,952 epoch 7 - iter 3126/5212 - loss 0.03259067 - time (sec): 254.65 - samples/sec: 864.70 - lr: 0.000011 - momentum: 0.000000
2023-10-17 11:56:47,386 epoch 7 - iter 3647/5212 - loss 0.03324339 - time (sec): 298.09 - samples/sec: 858.90 - lr: 0.000011 - momentum: 0.000000
2023-10-17 11:57:31,079 epoch 7 - iter 4168/5212 - loss 0.03204679 - time (sec): 341.78 - samples/sec: 865.26 - lr: 0.000011 - momentum: 0.000000
2023-10-17 11:58:13,042 epoch 7 - iter 4689/5212 - loss 0.03270349 - time (sec): 383.74 - samples/sec: 866.76 - lr: 0.000010 - momentum: 0.000000
2023-10-17 11:58:56,293 epoch 7 - iter 5210/5212 - loss 0.03210456 - time (sec): 426.99 - samples/sec: 860.40 - lr: 0.000010 - momentum: 0.000000
2023-10-17 11:58:56,458 ----------------------------------------------------------------------------------------------------
2023-10-17 11:58:56,458 EPOCH 7 done: loss 0.0321 - lr: 0.000010
2023-10-17 11:59:07,929 DEV : loss 0.4514279067516327 - f1-score (micro avg)  0.3873
2023-10-17 11:59:08,000 ----------------------------------------------------------------------------------------------------
2023-10-17 11:59:50,456 epoch 8 - iter 521/5212 - loss 0.02287542 - time (sec): 42.45 - samples/sec: 845.18 - lr: 0.000010 - momentum: 0.000000
2023-10-17 12:00:35,464 epoch 8 - iter 1042/5212 - loss 0.02032053 - time (sec): 87.46 - samples/sec: 818.29 - lr: 0.000009 - momentum: 0.000000
2023-10-17 12:01:18,589 epoch 8 - iter 1563/5212 - loss 0.02247712 - time (sec): 130.59 - samples/sec: 814.40 - lr: 0.000009 - momentum: 0.000000
2023-10-17 12:02:01,597 epoch 8 - iter 2084/5212 - loss 0.02124931 - time (sec): 173.59 - samples/sec: 821.16 - lr: 0.000009 - momentum: 0.000000
2023-10-17 12:02:44,007 epoch 8 - iter 2605/5212 - loss 0.02199294 - time (sec): 216.00 - samples/sec: 826.62 - lr: 0.000008 - momentum: 0.000000
2023-10-17 12:03:28,062 epoch 8 - iter 3126/5212 - loss 0.02206727 - time (sec): 260.06 - samples/sec: 829.98 - lr: 0.000008 - momentum: 0.000000
2023-10-17 12:04:11,244 epoch 8 - iter 3647/5212 - loss 0.02198914 - time (sec): 303.24 - samples/sec: 835.71 - lr: 0.000008 - momentum: 0.000000
2023-10-17 12:04:53,240 epoch 8 - iter 4168/5212 - loss 0.02133287 - time (sec): 345.24 - samples/sec: 846.62 - lr: 0.000007 - momentum: 0.000000
2023-10-17 12:05:35,938 epoch 8 - iter 4689/5212 - loss 0.02089455 - time (sec): 387.93 - samples/sec: 851.73 - lr: 0.000007 - momentum: 0.000000
2023-10-17 12:06:18,629 epoch 8 - iter 5210/5212 - loss 0.02139060 - time (sec): 430.63 - samples/sec: 853.09 - lr: 0.000007 - momentum: 0.000000
2023-10-17 12:06:18,786 ----------------------------------------------------------------------------------------------------
2023-10-17 12:06:18,787 EPOCH 8 done: loss 0.0214 - lr: 0.000007
2023-10-17 12:06:31,446 DEV : loss 0.42482689023017883 - f1-score (micro avg)  0.4045
2023-10-17 12:06:31,520 saving best model
2023-10-17 12:06:33,030 ----------------------------------------------------------------------------------------------------
2023-10-17 12:07:14,963 epoch 9 - iter 521/5212 - loss 0.01446293 - time (sec): 41.93 - samples/sec: 811.46 - lr: 0.000006 - momentum: 0.000000
2023-10-17 12:07:57,416 epoch 9 - iter 1042/5212 - loss 0.01733620 - time (sec): 84.38 - samples/sec: 845.43 - lr: 0.000006 - momentum: 0.000000
2023-10-17 12:08:38,808 epoch 9 - iter 1563/5212 - loss 0.01496632 - time (sec): 125.78 - samples/sec: 836.07 - lr: 0.000006 - momentum: 0.000000
2023-10-17 12:09:20,895 epoch 9 - iter 2084/5212 - loss 0.01584419 - time (sec): 167.86 - samples/sec: 834.70 - lr: 0.000005 - momentum: 0.000000
2023-10-17 12:10:02,618 epoch 9 - iter 2605/5212 - loss 0.01574388 - time (sec): 209.59 - samples/sec: 843.01 - lr: 0.000005 - momentum: 0.000000
2023-10-17 12:10:44,627 epoch 9 - iter 3126/5212 - loss 0.01792002 - time (sec): 251.59 - samples/sec: 847.62 - lr: 0.000005 - momentum: 0.000000
2023-10-17 12:11:27,348 epoch 9 - iter 3647/5212 - loss 0.01713778 - time (sec): 294.32 - samples/sec: 851.35 - lr: 0.000004 - momentum: 0.000000
2023-10-17 12:12:10,707 epoch 9 - iter 4168/5212 - loss 0.01688059 - time (sec): 337.68 - samples/sec: 855.37 - lr: 0.000004 - momentum: 0.000000
2023-10-17 12:12:53,136 epoch 9 - iter 4689/5212 - loss 0.01654568 - time (sec): 380.10 - samples/sec: 860.95 - lr: 0.000004 - momentum: 0.000000
2023-10-17 12:13:36,164 epoch 9 - iter 5210/5212 - loss 0.01654318 - time (sec): 423.13 - samples/sec: 868.21 - lr: 0.000003 - momentum: 0.000000
2023-10-17 12:13:36,312 ----------------------------------------------------------------------------------------------------
2023-10-17 12:13:36,313 EPOCH 9 done: loss 0.0165 - lr: 0.000003
2023-10-17 12:13:48,791 DEV : loss 0.41042855381965637 - f1-score (micro avg)  0.416
2023-10-17 12:13:48,867 saving best model
2023-10-17 12:13:50,323 ----------------------------------------------------------------------------------------------------
2023-10-17 12:14:33,294 epoch 10 - iter 521/5212 - loss 0.00781776 - time (sec): 42.96 - samples/sec: 865.88 - lr: 0.000003 - momentum: 0.000000
2023-10-17 12:15:15,785 epoch 10 - iter 1042/5212 - loss 0.00793401 - time (sec): 85.46 - samples/sec: 852.16 - lr: 0.000003 - momentum: 0.000000
2023-10-17 12:15:58,787 epoch 10 - iter 1563/5212 - loss 0.00880507 - time (sec): 128.46 - samples/sec: 831.97 - lr: 0.000002 - momentum: 0.000000
2023-10-17 12:16:41,351 epoch 10 - iter 2084/5212 - loss 0.00972971 - time (sec): 171.02 - samples/sec: 843.50 - lr: 0.000002 - momentum: 0.000000
2023-10-17 12:17:24,604 epoch 10 - iter 2605/5212 - loss 0.01024931 - time (sec): 214.27 - samples/sec: 848.53 - lr: 0.000002 - momentum: 0.000000
2023-10-17 12:18:05,735 epoch 10 - iter 3126/5212 - loss 0.00986422 - time (sec): 255.41 - samples/sec: 849.94 - lr: 0.000001 - momentum: 0.000000
2023-10-17 12:18:49,745 epoch 10 - iter 3647/5212 - loss 0.00950244 - time (sec): 299.42 - samples/sec: 845.97 - lr: 0.000001 - momentum: 0.000000
2023-10-17 12:19:32,787 epoch 10 - iter 4168/5212 - loss 0.00935274 - time (sec): 342.46 - samples/sec: 847.08 - lr: 0.000001 - momentum: 0.000000
2023-10-17 12:20:15,120 epoch 10 - iter 4689/5212 - loss 0.00939995 - time (sec): 384.79 - samples/sec: 855.40 - lr: 0.000000 - momentum: 0.000000
2023-10-17 12:20:57,845 epoch 10 - iter 5210/5212 - loss 0.00928220 - time (sec): 427.52 - samples/sec: 859.13 - lr: 0.000000 - momentum: 0.000000
2023-10-17 12:20:58,002 ----------------------------------------------------------------------------------------------------
2023-10-17 12:20:58,002 EPOCH 10 done: loss 0.0093 - lr: 0.000000
2023-10-17 12:21:11,081 DEV : loss 0.4939973056316376 - f1-score (micro avg)  0.3985
2023-10-17 12:21:11,724 ----------------------------------------------------------------------------------------------------
2023-10-17 12:21:11,727 Loading model from best epoch ...
2023-10-17 12:21:14,296 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-17 12:21:35,379 
Results:
- F-score (micro) 0.4827
- F-score (macro) 0.3288
- Accuracy 0.3215

By class:
              precision    recall  f1-score   support

         LOC     0.5341    0.5939    0.5624      1214
         PER     0.4248    0.4542    0.4390       808
         ORG     0.3102    0.3173    0.3137       353
   HumanProd     0.0000    0.0000    0.0000        15

   micro avg     0.4648    0.5021    0.4827      2390
   macro avg     0.3173    0.3413    0.3288      2390
weighted avg     0.4607    0.5021    0.4804      2390

2023-10-17 12:21:35,379 ----------------------------------------------------------------------------------------------------