initial commit
Browse files- README.md +155 -0
- loss.tsv +21 -0
- pytorch_model.bin +3 -0
- training.log +915 -0
README.md
ADDED
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- flair
|
4 |
+
- token-classification
|
5 |
+
- sequence-tagger-model
|
6 |
+
language: en
|
7 |
+
datasets:
|
8 |
+
- conll2003
|
9 |
+
inference: false
|
10 |
+
---
|
11 |
+
|
12 |
+
## English NER in Flair (large model)
|
13 |
+
|
14 |
+
This is the large 4-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
|
15 |
+
|
16 |
+
F1-Score: **94,36** (corrected CoNLL-03)
|
17 |
+
|
18 |
+
Predicts 4 tags:
|
19 |
+
|
20 |
+
| **tag** | **meaning** |
|
21 |
+
|---------------------------------|-----------|
|
22 |
+
| PER | person name |
|
23 |
+
| LOC | location name |
|
24 |
+
| ORG | organization name |
|
25 |
+
| MISC | other name |
|
26 |
+
|
27 |
+
Based on [document-level XLM-R embeddings](https://www.aclweb.org/anthology/C18-1139/).
|
28 |
+
|
29 |
+
---
|
30 |
+
|
31 |
+
### Demo: How to use in Flair
|
32 |
+
|
33 |
+
Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
|
34 |
+
|
35 |
+
```python
|
36 |
+
from flair.data import Sentence
|
37 |
+
from flair.models import SequenceTagger
|
38 |
+
|
39 |
+
# load tagger
|
40 |
+
tagger = SequenceTagger.load("flair/ner-english-large")
|
41 |
+
|
42 |
+
# make example sentence
|
43 |
+
sentence = Sentence("George Washington went to Washington")
|
44 |
+
|
45 |
+
# predict NER tags
|
46 |
+
tagger.predict(sentence)
|
47 |
+
|
48 |
+
# print sentence
|
49 |
+
print(sentence)
|
50 |
+
|
51 |
+
# print predicted NER spans
|
52 |
+
print('The following NER tags are found:')
|
53 |
+
# iterate over entities and print
|
54 |
+
for entity in sentence.get_spans('ner'):
|
55 |
+
print(entity)
|
56 |
+
|
57 |
+
```
|
58 |
+
|
59 |
+
This yields the following output:
|
60 |
+
```
|
61 |
+
Span [1,2]: "George Washington" [− Labels: PER (0.9968)]
|
62 |
+
Span [5]: "Washington" [− Labels: LOC (0.9994)]
|
63 |
+
```
|
64 |
+
|
65 |
+
So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington went to Washington*".
|
66 |
+
|
67 |
+
|
68 |
+
---
|
69 |
+
|
70 |
+
### Training: Script to train this model
|
71 |
+
|
72 |
+
The following Flair script was used to train this model:
|
73 |
+
|
74 |
+
```python
|
75 |
+
import torch
|
76 |
+
|
77 |
+
# 1. get the corpus
|
78 |
+
from flair.datasets import CONLL_03
|
79 |
+
|
80 |
+
corpus = CONLL_03()
|
81 |
+
|
82 |
+
# 2. what tag do we want to predict?
|
83 |
+
tag_type = 'ner'
|
84 |
+
|
85 |
+
# 3. make the tag dictionary from the corpus
|
86 |
+
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
|
87 |
+
|
88 |
+
# 4. initialize fine-tuneable transformer embeddings WITH document context
|
89 |
+
from flair.embeddings import TransformerWordEmbeddings
|
90 |
+
|
91 |
+
embeddings = TransformerWordEmbeddings(
|
92 |
+
model='xlm-roberta-large',
|
93 |
+
layers="-1",
|
94 |
+
subtoken_pooling="first",
|
95 |
+
fine_tune=True,
|
96 |
+
use_context=True,
|
97 |
+
)
|
98 |
+
|
99 |
+
# 5. initialize bare-bones sequence tagger (no CRF, no RNN, no reprojection)
|
100 |
+
from flair.models import SequenceTagger
|
101 |
+
|
102 |
+
tagger = SequenceTagger(
|
103 |
+
hidden_size=256,
|
104 |
+
embeddings=embeddings,
|
105 |
+
tag_dictionary=tag_dictionary,
|
106 |
+
tag_type='ner',
|
107 |
+
use_crf=False,
|
108 |
+
use_rnn=False,
|
109 |
+
reproject_embeddings=False,
|
110 |
+
)
|
111 |
+
|
112 |
+
# 6. initialize trainer with AdamW optimizer
|
113 |
+
from flair.trainers import ModelTrainer
|
114 |
+
|
115 |
+
trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)
|
116 |
+
|
117 |
+
# 7. run training with XLM parameters (20 epochs, small LR)
|
118 |
+
from torch.optim.lr_scheduler import OneCycleLR
|
119 |
+
|
120 |
+
trainer.train('resources/taggers/ner-english-large',
|
121 |
+
learning_rate=5.0e-6,
|
122 |
+
mini_batch_size=4,
|
123 |
+
mini_batch_chunk_size=1,
|
124 |
+
max_epochs=20,
|
125 |
+
scheduler=OneCycleLR,
|
126 |
+
embeddings_storage_mode='none',
|
127 |
+
weight_decay=0.,
|
128 |
+
)
|
129 |
+
|
130 |
+
)
|
131 |
+
```
|
132 |
+
|
133 |
+
|
134 |
+
|
135 |
+
---
|
136 |
+
|
137 |
+
### Cite
|
138 |
+
|
139 |
+
Please cite the following paper when using this model.
|
140 |
+
|
141 |
+
```
|
142 |
+
@inproceedings{akbik2018coling,
|
143 |
+
title={Contextual String Embeddings for Sequence Labeling},
|
144 |
+
author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
|
145 |
+
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
|
146 |
+
pages = {1638--1649},
|
147 |
+
year = {2018}
|
148 |
+
}
|
149 |
+
```
|
150 |
+
|
151 |
+
---
|
152 |
+
|
153 |
+
### Issues?
|
154 |
+
|
155 |
+
The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
|
loss.tsv
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS TEST_LOSS TEST_PRECISION TEST_RECALL TEST_F1
|
2 |
+
1 12:33:21 4 0.0000 0.3592169054972083 0.17636922001838684 0.9072 0.9043 0.9057
|
3 |
+
2 13:10:20 4 0.0000 0.2211920055868173 0.11124755442142487 0.9254 0.9305 0.9279
|
4 |
+
3 13:47:17 4 0.0000 0.19381283097644833 0.11028687655925751 0.9294 0.9402 0.9348
|
5 |
+
4 14:24:09 4 0.0000 0.1925625054863614 0.10681818425655365 0.9321 0.9445 0.9383
|
6 |
+
5 15:00:56 4 0.0000 0.18075492875338123 0.11206260323524475 0.9368 0.9406 0.9387
|
7 |
+
6 15:37:42 4 0.0000 0.17524857581343897 0.11003755778074265 0.9335 0.9410 0.9372
|
8 |
+
7 16:14:31 4 0.0000 0.15931038153566326 0.1253410428762436 0.9304 0.9358 0.9331
|
9 |
+
8 16:51:24 4 0.0000 0.16053336656099176 0.12391051650047302 0.9396 0.9426 0.9411
|
10 |
+
9 17:28:05 4 0.0000 0.1590428160623491 0.1257738471031189 0.9339 0.9473 0.9406
|
11 |
+
10 18:04:53 4 0.0000 0.14964490167298533 0.1382586508989334 0.9302 0.9408 0.9355
|
12 |
+
11 18:41:33 4 0.0000 0.14682254513923948 0.13701947033405304 0.9351 0.9424 0.9387
|
13 |
+
12 19:18:12 4 0.0000 0.15043419943368025 0.15095502138137817 0.9359 0.9418 0.9388
|
14 |
+
13 19:54:57 4 0.0000 0.14460110974684232 0.14258751273155212 0.9374 0.9424 0.9399
|
15 |
+
14 20:31:42 4 0.0000 0.14146839202254796 0.16016331315040588 0.9372 0.9420 0.9396
|
16 |
+
15 21:08:34 4 0.0000 0.14678317207995362 0.15258659422397614 0.9380 0.9420 0.9400
|
17 |
+
16 21:45:16 4 0.0000 0.152214311589488 0.14317740499973297 0.9405 0.9465 0.9434
|
18 |
+
17 22:21:56 4 0.0000 0.1459472061536186 0.14864514768123627 0.9411 0.9459 0.9435
|
19 |
+
18 22:58:37 4 0.0000 0.1397127115109613 0.1518455296754837 0.9409 0.9465 0.9437
|
20 |
+
19 23:35:20 4 0.0000 0.14249197369562547 0.15170469880104065 0.9406 0.9461 0.9433
|
21 |
+
20 00:12:06 4 0.0000 0.13975302390449296 0.15191785991191864 0.9408 0.9465 0.9436
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1f59c05bbd3db05518b632f212b1aac7de1ff0b3914d6c0d587b6a68e214a287
|
3 |
+
size 2239866761
|
training.log
ADDED
@@ -0,0 +1,915 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2021-02-20 11:56:18,090 ----------------------------------------------------------------------------------------------------
|
2 |
+
2021-02-20 11:56:18,093 Model: "SequenceTagger(
|
3 |
+
(embeddings): TransformerWordEmbeddings(
|
4 |
+
(model): XLMRobertaModel(
|
5 |
+
(embeddings): RobertaEmbeddings(
|
6 |
+
(word_embeddings): Embedding(250002, 1024, padding_idx=1)
|
7 |
+
(position_embeddings): Embedding(514, 1024, padding_idx=1)
|
8 |
+
(token_type_embeddings): Embedding(1, 1024)
|
9 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
10 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
11 |
+
)
|
12 |
+
(encoder): RobertaEncoder(
|
13 |
+
(layer): ModuleList(
|
14 |
+
(0): RobertaLayer(
|
15 |
+
(attention): RobertaAttention(
|
16 |
+
(self): RobertaSelfAttention(
|
17 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
18 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
19 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
21 |
+
)
|
22 |
+
(output): RobertaSelfOutput(
|
23 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
24 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
25 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
26 |
+
)
|
27 |
+
)
|
28 |
+
(intermediate): RobertaIntermediate(
|
29 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
30 |
+
)
|
31 |
+
(output): RobertaOutput(
|
32 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
33 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
34 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
35 |
+
)
|
36 |
+
)
|
37 |
+
(1): RobertaLayer(
|
38 |
+
(attention): RobertaAttention(
|
39 |
+
(self): RobertaSelfAttention(
|
40 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
41 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
42 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
43 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
44 |
+
)
|
45 |
+
(output): RobertaSelfOutput(
|
46 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
47 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
48 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
49 |
+
)
|
50 |
+
)
|
51 |
+
(intermediate): RobertaIntermediate(
|
52 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
53 |
+
)
|
54 |
+
(output): RobertaOutput(
|
55 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
56 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
57 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
58 |
+
)
|
59 |
+
)
|
60 |
+
(2): RobertaLayer(
|
61 |
+
(attention): RobertaAttention(
|
62 |
+
(self): RobertaSelfAttention(
|
63 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
64 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
65 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
66 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
67 |
+
)
|
68 |
+
(output): RobertaSelfOutput(
|
69 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
70 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
71 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
72 |
+
)
|
73 |
+
)
|
74 |
+
(intermediate): RobertaIntermediate(
|
75 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
76 |
+
)
|
77 |
+
(output): RobertaOutput(
|
78 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
79 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
80 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
81 |
+
)
|
82 |
+
)
|
83 |
+
(3): RobertaLayer(
|
84 |
+
(attention): RobertaAttention(
|
85 |
+
(self): RobertaSelfAttention(
|
86 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
87 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
88 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
89 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
90 |
+
)
|
91 |
+
(output): RobertaSelfOutput(
|
92 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
93 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
94 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
95 |
+
)
|
96 |
+
)
|
97 |
+
(intermediate): RobertaIntermediate(
|
98 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
99 |
+
)
|
100 |
+
(output): RobertaOutput(
|
101 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
102 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
103 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
104 |
+
)
|
105 |
+
)
|
106 |
+
(4): RobertaLayer(
|
107 |
+
(attention): RobertaAttention(
|
108 |
+
(self): RobertaSelfAttention(
|
109 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
110 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
111 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
112 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
113 |
+
)
|
114 |
+
(output): RobertaSelfOutput(
|
115 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
116 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
117 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
118 |
+
)
|
119 |
+
)
|
120 |
+
(intermediate): RobertaIntermediate(
|
121 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
122 |
+
)
|
123 |
+
(output): RobertaOutput(
|
124 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
125 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
126 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
127 |
+
)
|
128 |
+
)
|
129 |
+
(5): RobertaLayer(
|
130 |
+
(attention): RobertaAttention(
|
131 |
+
(self): RobertaSelfAttention(
|
132 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
133 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
134 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
135 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
136 |
+
)
|
137 |
+
(output): RobertaSelfOutput(
|
138 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
139 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
140 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
141 |
+
)
|
142 |
+
)
|
143 |
+
(intermediate): RobertaIntermediate(
|
144 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
145 |
+
)
|
146 |
+
(output): RobertaOutput(
|
147 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
148 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
149 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
150 |
+
)
|
151 |
+
)
|
152 |
+
(6): RobertaLayer(
|
153 |
+
(attention): RobertaAttention(
|
154 |
+
(self): RobertaSelfAttention(
|
155 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
156 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
157 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
158 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
159 |
+
)
|
160 |
+
(output): RobertaSelfOutput(
|
161 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
162 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
163 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
164 |
+
)
|
165 |
+
)
|
166 |
+
(intermediate): RobertaIntermediate(
|
167 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
168 |
+
)
|
169 |
+
(output): RobertaOutput(
|
170 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
171 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
172 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
173 |
+
)
|
174 |
+
)
|
175 |
+
(7): RobertaLayer(
|
176 |
+
(attention): RobertaAttention(
|
177 |
+
(self): RobertaSelfAttention(
|
178 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
179 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
180 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
181 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
182 |
+
)
|
183 |
+
(output): RobertaSelfOutput(
|
184 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
185 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
186 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
187 |
+
)
|
188 |
+
)
|
189 |
+
(intermediate): RobertaIntermediate(
|
190 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
191 |
+
)
|
192 |
+
(output): RobertaOutput(
|
193 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
194 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
195 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
196 |
+
)
|
197 |
+
)
|
198 |
+
(8): RobertaLayer(
|
199 |
+
(attention): RobertaAttention(
|
200 |
+
(self): RobertaSelfAttention(
|
201 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
202 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
203 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
204 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
205 |
+
)
|
206 |
+
(output): RobertaSelfOutput(
|
207 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
208 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
209 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
210 |
+
)
|
211 |
+
)
|
212 |
+
(intermediate): RobertaIntermediate(
|
213 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
214 |
+
)
|
215 |
+
(output): RobertaOutput(
|
216 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
217 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
218 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
219 |
+
)
|
220 |
+
)
|
221 |
+
(9): RobertaLayer(
|
222 |
+
(attention): RobertaAttention(
|
223 |
+
(self): RobertaSelfAttention(
|
224 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
225 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
226 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
227 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
228 |
+
)
|
229 |
+
(output): RobertaSelfOutput(
|
230 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
231 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
232 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
233 |
+
)
|
234 |
+
)
|
235 |
+
(intermediate): RobertaIntermediate(
|
236 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
237 |
+
)
|
238 |
+
(output): RobertaOutput(
|
239 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
240 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
241 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
242 |
+
)
|
243 |
+
)
|
244 |
+
(10): RobertaLayer(
|
245 |
+
(attention): RobertaAttention(
|
246 |
+
(self): RobertaSelfAttention(
|
247 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
248 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
249 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
250 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
251 |
+
)
|
252 |
+
(output): RobertaSelfOutput(
|
253 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
254 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
255 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
256 |
+
)
|
257 |
+
)
|
258 |
+
(intermediate): RobertaIntermediate(
|
259 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
260 |
+
)
|
261 |
+
(output): RobertaOutput(
|
262 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
263 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
264 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
265 |
+
)
|
266 |
+
)
|
267 |
+
(11): RobertaLayer(
|
268 |
+
(attention): RobertaAttention(
|
269 |
+
(self): RobertaSelfAttention(
|
270 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
271 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
272 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
273 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
274 |
+
)
|
275 |
+
(output): RobertaSelfOutput(
|
276 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
277 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
278 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
279 |
+
)
|
280 |
+
)
|
281 |
+
(intermediate): RobertaIntermediate(
|
282 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
283 |
+
)
|
284 |
+
(output): RobertaOutput(
|
285 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
286 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
287 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
288 |
+
)
|
289 |
+
)
|
290 |
+
(12): RobertaLayer(
|
291 |
+
(attention): RobertaAttention(
|
292 |
+
(self): RobertaSelfAttention(
|
293 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
294 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
295 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
296 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
297 |
+
)
|
298 |
+
(output): RobertaSelfOutput(
|
299 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
300 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
301 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
302 |
+
)
|
303 |
+
)
|
304 |
+
(intermediate): RobertaIntermediate(
|
305 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
306 |
+
)
|
307 |
+
(output): RobertaOutput(
|
308 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
309 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
310 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
311 |
+
)
|
312 |
+
)
|
313 |
+
(13): RobertaLayer(
|
314 |
+
(attention): RobertaAttention(
|
315 |
+
(self): RobertaSelfAttention(
|
316 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
317 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
318 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
319 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
320 |
+
)
|
321 |
+
(output): RobertaSelfOutput(
|
322 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
323 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
324 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
325 |
+
)
|
326 |
+
)
|
327 |
+
(intermediate): RobertaIntermediate(
|
328 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
329 |
+
)
|
330 |
+
(output): RobertaOutput(
|
331 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
332 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
333 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
334 |
+
)
|
335 |
+
)
|
336 |
+
(14): RobertaLayer(
|
337 |
+
(attention): RobertaAttention(
|
338 |
+
(self): RobertaSelfAttention(
|
339 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
340 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
341 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
342 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
343 |
+
)
|
344 |
+
(output): RobertaSelfOutput(
|
345 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
346 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
347 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
348 |
+
)
|
349 |
+
)
|
350 |
+
(intermediate): RobertaIntermediate(
|
351 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
352 |
+
)
|
353 |
+
(output): RobertaOutput(
|
354 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
355 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
356 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
357 |
+
)
|
358 |
+
)
|
359 |
+
(15): RobertaLayer(
|
360 |
+
(attention): RobertaAttention(
|
361 |
+
(self): RobertaSelfAttention(
|
362 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
363 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
364 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
365 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
366 |
+
)
|
367 |
+
(output): RobertaSelfOutput(
|
368 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
369 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
370 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
371 |
+
)
|
372 |
+
)
|
373 |
+
(intermediate): RobertaIntermediate(
|
374 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
375 |
+
)
|
376 |
+
(output): RobertaOutput(
|
377 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
378 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
379 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
380 |
+
)
|
381 |
+
)
|
382 |
+
(16): RobertaLayer(
|
383 |
+
(attention): RobertaAttention(
|
384 |
+
(self): RobertaSelfAttention(
|
385 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
386 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
387 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
388 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
389 |
+
)
|
390 |
+
(output): RobertaSelfOutput(
|
391 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
392 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
393 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
394 |
+
)
|
395 |
+
)
|
396 |
+
(intermediate): RobertaIntermediate(
|
397 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
398 |
+
)
|
399 |
+
(output): RobertaOutput(
|
400 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
401 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
402 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
403 |
+
)
|
404 |
+
)
|
405 |
+
(17): RobertaLayer(
|
406 |
+
(attention): RobertaAttention(
|
407 |
+
(self): RobertaSelfAttention(
|
408 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
409 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
410 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
411 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
412 |
+
)
|
413 |
+
(output): RobertaSelfOutput(
|
414 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
415 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
416 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
417 |
+
)
|
418 |
+
)
|
419 |
+
(intermediate): RobertaIntermediate(
|
420 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
421 |
+
)
|
422 |
+
(output): RobertaOutput(
|
423 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
424 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
425 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
426 |
+
)
|
427 |
+
)
|
428 |
+
(18): RobertaLayer(
|
429 |
+
(attention): RobertaAttention(
|
430 |
+
(self): RobertaSelfAttention(
|
431 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
432 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
433 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
434 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
435 |
+
)
|
436 |
+
(output): RobertaSelfOutput(
|
437 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
438 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
439 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
440 |
+
)
|
441 |
+
)
|
442 |
+
(intermediate): RobertaIntermediate(
|
443 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
444 |
+
)
|
445 |
+
(output): RobertaOutput(
|
446 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
447 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
448 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
449 |
+
)
|
450 |
+
)
|
451 |
+
(19): RobertaLayer(
|
452 |
+
(attention): RobertaAttention(
|
453 |
+
(self): RobertaSelfAttention(
|
454 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
455 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
456 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
457 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
458 |
+
)
|
459 |
+
(output): RobertaSelfOutput(
|
460 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
461 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
462 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
463 |
+
)
|
464 |
+
)
|
465 |
+
(intermediate): RobertaIntermediate(
|
466 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
467 |
+
)
|
468 |
+
(output): RobertaOutput(
|
469 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
470 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
471 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
472 |
+
)
|
473 |
+
)
|
474 |
+
(20): RobertaLayer(
|
475 |
+
(attention): RobertaAttention(
|
476 |
+
(self): RobertaSelfAttention(
|
477 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
478 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
479 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
480 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
481 |
+
)
|
482 |
+
(output): RobertaSelfOutput(
|
483 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
484 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
485 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
486 |
+
)
|
487 |
+
)
|
488 |
+
(intermediate): RobertaIntermediate(
|
489 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
490 |
+
)
|
491 |
+
(output): RobertaOutput(
|
492 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
493 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
494 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
495 |
+
)
|
496 |
+
)
|
497 |
+
(21): RobertaLayer(
|
498 |
+
(attention): RobertaAttention(
|
499 |
+
(self): RobertaSelfAttention(
|
500 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
501 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
502 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
503 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
504 |
+
)
|
505 |
+
(output): RobertaSelfOutput(
|
506 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
507 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
508 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
509 |
+
)
|
510 |
+
)
|
511 |
+
(intermediate): RobertaIntermediate(
|
512 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
513 |
+
)
|
514 |
+
(output): RobertaOutput(
|
515 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
516 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
517 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
518 |
+
)
|
519 |
+
)
|
520 |
+
(22): RobertaLayer(
|
521 |
+
(attention): RobertaAttention(
|
522 |
+
(self): RobertaSelfAttention(
|
523 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
524 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
525 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
526 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
527 |
+
)
|
528 |
+
(output): RobertaSelfOutput(
|
529 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
530 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
531 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
532 |
+
)
|
533 |
+
)
|
534 |
+
(intermediate): RobertaIntermediate(
|
535 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
536 |
+
)
|
537 |
+
(output): RobertaOutput(
|
538 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
539 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
540 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
541 |
+
)
|
542 |
+
)
|
543 |
+
(23): RobertaLayer(
|
544 |
+
(attention): RobertaAttention(
|
545 |
+
(self): RobertaSelfAttention(
|
546 |
+
(query): Linear(in_features=1024, out_features=1024, bias=True)
|
547 |
+
(key): Linear(in_features=1024, out_features=1024, bias=True)
|
548 |
+
(value): Linear(in_features=1024, out_features=1024, bias=True)
|
549 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
550 |
+
)
|
551 |
+
(output): RobertaSelfOutput(
|
552 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
553 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
554 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
555 |
+
)
|
556 |
+
)
|
557 |
+
(intermediate): RobertaIntermediate(
|
558 |
+
(dense): Linear(in_features=1024, out_features=4096, bias=True)
|
559 |
+
)
|
560 |
+
(output): RobertaOutput(
|
561 |
+
(dense): Linear(in_features=4096, out_features=1024, bias=True)
|
562 |
+
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
|
563 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
564 |
+
)
|
565 |
+
)
|
566 |
+
)
|
567 |
+
)
|
568 |
+
(pooler): RobertaPooler(
|
569 |
+
(dense): Linear(in_features=1024, out_features=1024, bias=True)
|
570 |
+
(activation): Tanh()
|
571 |
+
)
|
572 |
+
)
|
573 |
+
)
|
574 |
+
(word_dropout): WordDropout(p=0.05)
|
575 |
+
(locked_dropout): LockedDropout(p=0.5)
|
576 |
+
(linear): Linear(in_features=1024, out_features=20, bias=True)
|
577 |
+
(beta): 1.0
|
578 |
+
(weights): None
|
579 |
+
(weight_tensor) None
|
580 |
+
)"
|
581 |
+
2021-02-20 11:56:18,094 ----------------------------------------------------------------------------------------------------
|
582 |
+
2021-02-20 11:56:18,095 Corpus: "MultiCorpus: 16744 train + 3449 dev + 3658 test sentences
|
583 |
+
- CONLL_03 Corpus: 14903 train + 3449 dev + 3658 test sentences
|
584 |
+
- WIKIGOLD_NER Corpus: 1841 train + 0 dev + 0 test sentences"
|
585 |
+
2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
|
586 |
+
2021-02-20 11:56:18,095 Parameters:
|
587 |
+
2021-02-20 11:56:18,095 - learning_rate: "5e-06"
|
588 |
+
2021-02-20 11:56:18,095 - mini_batch_size: "4"
|
589 |
+
2021-02-20 11:56:18,095 - patience: "3"
|
590 |
+
2021-02-20 11:56:18,095 - anneal_factor: "0.5"
|
591 |
+
2021-02-20 11:56:18,095 - max_epochs: "20"
|
592 |
+
2021-02-20 11:56:18,095 - shuffle: "True"
|
593 |
+
2021-02-20 11:56:18,095 - train_with_dev: "True"
|
594 |
+
2021-02-20 11:56:18,095 - batch_growth_annealing: "False"
|
595 |
+
2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
|
596 |
+
2021-02-20 11:56:18,095 Model training base path: "resources/contextdrop/d-flert-en_release-ft+dev-xlm-roberta-large-context+drop-64-True-42"
|
597 |
+
2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
|
598 |
+
2021-02-20 11:56:18,095 Device: cuda:1
|
599 |
+
2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
|
600 |
+
2021-02-20 11:56:18,095 Embeddings storage mode: none
|
601 |
+
2021-02-20 11:56:18,104 ----------------------------------------------------------------------------------------------------
|
602 |
+
2021-02-20 11:59:49,493 epoch 1 - iter 504/5049 - loss 0.84988712 - samples/sec: 9.54 - lr: 0.000005
|
603 |
+
2021-02-20 12:03:17,203 epoch 1 - iter 1008/5049 - loss 0.64131590 - samples/sec: 9.71 - lr: 0.000005
|
604 |
+
2021-02-20 12:06:42,427 epoch 1 - iter 1512/5049 - loss 0.54315957 - samples/sec: 9.82 - lr: 0.000005
|
605 |
+
2021-02-20 12:10:12,872 epoch 1 - iter 2016/5049 - loss 0.48025516 - samples/sec: 9.58 - lr: 0.000005
|
606 |
+
2021-02-20 12:13:43,522 epoch 1 - iter 2520/5049 - loss 0.46057764 - samples/sec: 9.57 - lr: 0.000005
|
607 |
+
2021-02-20 12:17:12,894 epoch 1 - iter 3024/5049 - loss 0.42570537 - samples/sec: 9.63 - lr: 0.000005
|
608 |
+
2021-02-20 12:20:41,525 epoch 1 - iter 3528/5049 - loss 0.39857695 - samples/sec: 9.66 - lr: 0.000005
|
609 |
+
2021-02-20 12:24:14,564 epoch 1 - iter 4032/5049 - loss 0.38416717 - samples/sec: 9.46 - lr: 0.000005
|
610 |
+
2021-02-20 12:27:45,615 epoch 1 - iter 4536/5049 - loss 0.37032747 - samples/sec: 9.55 - lr: 0.000005
|
611 |
+
2021-02-20 12:31:13,574 epoch 1 - iter 5040/5049 - loss 0.35966340 - samples/sec: 9.70 - lr: 0.000005
|
612 |
+
2021-02-20 12:31:17,124 ----------------------------------------------------------------------------------------------------
|
613 |
+
2021-02-20 12:31:17,124 EPOCH 1 done: loss 0.3592 - lr 0.0000050
|
614 |
+
2021-02-20 12:33:21,019 TEST : loss 0.17636922001838684 - score 0.9057
|
615 |
+
2021-02-20 12:33:21,046 BAD EPOCHS (no improvement): 4
|
616 |
+
2021-02-20 12:33:21,047 ----------------------------------------------------------------------------------------------------
|
617 |
+
2021-02-20 12:36:49,271 epoch 2 - iter 504/5049 - loss 0.25564826 - samples/sec: 9.68 - lr: 0.000005
|
618 |
+
2021-02-20 12:40:20,219 epoch 2 - iter 1008/5049 - loss 0.25560543 - samples/sec: 9.56 - lr: 0.000005
|
619 |
+
2021-02-20 12:43:48,750 epoch 2 - iter 1512/5049 - loss 0.24306949 - samples/sec: 9.67 - lr: 0.000005
|
620 |
+
2021-02-20 12:47:18,010 epoch 2 - iter 2016/5049 - loss 0.23918902 - samples/sec: 9.63 - lr: 0.000005
|
621 |
+
2021-02-20 12:50:49,034 epoch 2 - iter 2520/5049 - loss 0.23745494 - samples/sec: 9.55 - lr: 0.000005
|
622 |
+
2021-02-20 12:54:18,224 epoch 2 - iter 3024/5049 - loss 0.23599522 - samples/sec: 9.64 - lr: 0.000005
|
623 |
+
2021-02-20 12:57:46,500 epoch 2 - iter 3528/5049 - loss 0.22758435 - samples/sec: 9.68 - lr: 0.000005
|
624 |
+
2021-02-20 13:01:14,137 epoch 2 - iter 4032/5049 - loss 0.22602197 - samples/sec: 9.71 - lr: 0.000005
|
625 |
+
2021-02-20 13:04:43,356 epoch 2 - iter 4536/5049 - loss 0.22365802 - samples/sec: 9.64 - lr: 0.000005
|
626 |
+
2021-02-20 13:08:14,129 epoch 2 - iter 5040/5049 - loss 0.22152549 - samples/sec: 9.57 - lr: 0.000005
|
627 |
+
2021-02-20 13:08:17,630 ----------------------------------------------------------------------------------------------------
|
628 |
+
2021-02-20 13:08:17,630 EPOCH 2 done: loss 0.2212 - lr 0.0000049
|
629 |
+
2021-02-20 13:10:20,643 TEST : loss 0.11124755442142487 - score 0.9279
|
630 |
+
2021-02-20 13:10:20,675 BAD EPOCHS (no improvement): 4
|
631 |
+
2021-02-20 13:10:20,680 ----------------------------------------------------------------------------------------------------
|
632 |
+
2021-02-20 13:13:50,443 epoch 3 - iter 504/5049 - loss 0.17266852 - samples/sec: 9.61 - lr: 0.000005
|
633 |
+
2021-02-20 13:17:19,023 epoch 3 - iter 1008/5049 - loss 0.18002962 - samples/sec: 9.67 - lr: 0.000005
|
634 |
+
2021-02-20 13:20:49,199 epoch 3 - iter 1512/5049 - loss 0.18510266 - samples/sec: 9.59 - lr: 0.000005
|
635 |
+
2021-02-20 13:24:19,385 epoch 3 - iter 2016/5049 - loss 0.19983503 - samples/sec: 9.59 - lr: 0.000005
|
636 |
+
2021-02-20 13:27:48,348 epoch 3 - iter 2520/5049 - loss 0.20190812 - samples/sec: 9.65 - lr: 0.000005
|
637 |
+
2021-02-20 13:31:15,582 epoch 3 - iter 3024/5049 - loss 0.19944912 - samples/sec: 9.73 - lr: 0.000005
|
638 |
+
2021-02-20 13:34:43,944 epoch 3 - iter 3528/5049 - loss 0.19932389 - samples/sec: 9.68 - lr: 0.000005
|
639 |
+
2021-02-20 13:38:13,075 epoch 3 - iter 4032/5049 - loss 0.19547160 - samples/sec: 9.64 - lr: 0.000005
|
640 |
+
2021-02-20 13:41:42,971 epoch 3 - iter 4536/5049 - loss 0.19618987 - samples/sec: 9.61 - lr: 0.000005
|
641 |
+
2021-02-20 13:45:10,066 epoch 3 - iter 5040/5049 - loss 0.19343864 - samples/sec: 9.74 - lr: 0.000005
|
642 |
+
2021-02-20 13:45:13,621 ----------------------------------------------------------------------------------------------------
|
643 |
+
2021-02-20 13:45:13,622 EPOCH 3 done: loss 0.1938 - lr 0.0000047
|
644 |
+
2021-02-20 13:47:17,651 TEST : loss 0.11028687655925751 - score 0.9348
|
645 |
+
2021-02-20 13:47:17,678 BAD EPOCHS (no improvement): 4
|
646 |
+
2021-02-20 13:47:17,680 ----------------------------------------------------------------------------------------------------
|
647 |
+
2021-02-20 13:50:48,046 epoch 4 - iter 504/5049 - loss 0.19022199 - samples/sec: 9.58 - lr: 0.000005
|
648 |
+
2021-02-20 13:54:14,852 epoch 4 - iter 1008/5049 - loss 0.17976050 - samples/sec: 9.75 - lr: 0.000005
|
649 |
+
2021-02-20 13:57:44,871 epoch 4 - iter 1512/5049 - loss 0.17729127 - samples/sec: 9.60 - lr: 0.000005
|
650 |
+
2021-02-20 14:01:14,307 epoch 4 - iter 2016/5049 - loss 0.17812706 - samples/sec: 9.63 - lr: 0.000005
|
651 |
+
2021-02-20 14:04:41,981 epoch 4 - iter 2520/5049 - loss 0.18816455 - samples/sec: 9.71 - lr: 0.000005
|
652 |
+
2021-02-20 14:08:10,238 epoch 4 - iter 3024/5049 - loss 0.18990221 - samples/sec: 9.68 - lr: 0.000005
|
653 |
+
2021-02-20 14:11:38,151 epoch 4 - iter 3528/5049 - loss 0.19181303 - samples/sec: 9.70 - lr: 0.000005
|
654 |
+
2021-02-20 14:15:03,479 epoch 4 - iter 4032/5049 - loss 0.19180866 - samples/sec: 9.82 - lr: 0.000005
|
655 |
+
2021-02-20 14:18:32,995 epoch 4 - iter 4536/5049 - loss 0.19160628 - samples/sec: 9.62 - lr: 0.000005
|
656 |
+
2021-02-20 14:22:00,977 epoch 4 - iter 5040/5049 - loss 0.19256281 - samples/sec: 9.69 - lr: 0.000005
|
657 |
+
2021-02-20 14:22:04,481 ----------------------------------------------------------------------------------------------------
|
658 |
+
2021-02-20 14:22:04,482 EPOCH 4 done: loss 0.1926 - lr 0.0000045
|
659 |
+
2021-02-20 14:24:09,809 TEST : loss 0.10681818425655365 - score 0.9383
|
660 |
+
2021-02-20 14:24:09,842 BAD EPOCHS (no improvement): 4
|
661 |
+
2021-02-20 14:24:09,844 ----------------------------------------------------------------------------------------------------
|
662 |
+
2021-02-20 14:27:37,280 epoch 5 - iter 504/5049 - loss 0.16645148 - samples/sec: 9.72 - lr: 0.000004
|
663 |
+
2021-02-20 14:31:05,862 epoch 5 - iter 1008/5049 - loss 0.17264234 - samples/sec: 9.67 - lr: 0.000004
|
664 |
+
2021-02-20 14:34:31,375 epoch 5 - iter 1512/5049 - loss 0.18603685 - samples/sec: 9.81 - lr: 0.000004
|
665 |
+
2021-02-20 14:37:57,695 epoch 5 - iter 2016/5049 - loss 0.18245931 - samples/sec: 9.77 - lr: 0.000004
|
666 |
+
2021-02-20 14:41:25,198 epoch 5 - iter 2520/5049 - loss 0.19293042 - samples/sec: 9.72 - lr: 0.000004
|
667 |
+
2021-02-20 14:44:53,631 epoch 5 - iter 3024/5049 - loss 0.19454820 - samples/sec: 9.67 - lr: 0.000004
|
668 |
+
2021-02-20 14:48:21,579 epoch 5 - iter 3528/5049 - loss 0.18990338 - samples/sec: 9.70 - lr: 0.000004
|
669 |
+
2021-02-20 14:51:51,276 epoch 5 - iter 4032/5049 - loss 0.18768864 - samples/sec: 9.61 - lr: 0.000004
|
670 |
+
2021-02-20 14:55:18,914 epoch 5 - iter 4536/5049 - loss 0.18508693 - samples/sec: 9.71 - lr: 0.000004
|
671 |
+
2021-02-20 14:58:47,195 epoch 5 - iter 5040/5049 - loss 0.18082235 - samples/sec: 9.68 - lr: 0.000004
|
672 |
+
2021-02-20 14:58:50,697 ----------------------------------------------------------------------------------------------------
|
673 |
+
2021-02-20 14:58:50,697 EPOCH 5 done: loss 0.1808 - lr 0.0000043
|
674 |
+
2021-02-20 15:00:56,633 TEST : loss 0.11206260323524475 - score 0.9387
|
675 |
+
2021-02-20 15:00:56,668 BAD EPOCHS (no improvement): 4
|
676 |
+
2021-02-20 15:00:56,672 ----------------------------------------------------------------------------------------------------
|
677 |
+
2021-02-20 15:04:25,586 epoch 6 - iter 504/5049 - loss 0.15912418 - samples/sec: 9.65 - lr: 0.000004
|
678 |
+
2021-02-20 15:07:53,476 epoch 6 - iter 1008/5049 - loss 0.14931369 - samples/sec: 9.70 - lr: 0.000004
|
679 |
+
2021-02-20 15:11:20,667 epoch 6 - iter 1512/5049 - loss 0.15761230 - samples/sec: 9.73 - lr: 0.000004
|
680 |
+
2021-02-20 15:14:47,624 epoch 6 - iter 2016/5049 - loss 0.16075756 - samples/sec: 9.74 - lr: 0.000004
|
681 |
+
2021-02-20 15:18:15,842 epoch 6 - iter 2520/5049 - loss 0.16126459 - samples/sec: 9.68 - lr: 0.000004
|
682 |
+
2021-02-20 15:21:44,174 epoch 6 - iter 3024/5049 - loss 0.16137015 - samples/sec: 9.68 - lr: 0.000004
|
683 |
+
2021-02-20 15:25:11,675 epoch 6 - iter 3528/5049 - loss 0.16742578 - samples/sec: 9.72 - lr: 0.000004
|
684 |
+
2021-02-20 15:28:38,600 epoch 6 - iter 4032/5049 - loss 0.17104120 - samples/sec: 9.74 - lr: 0.000004
|
685 |
+
2021-02-20 15:32:04,821 epoch 6 - iter 4536/5049 - loss 0.17299492 - samples/sec: 9.78 - lr: 0.000004
|
686 |
+
2021-02-20 15:35:33,611 epoch 6 - iter 5040/5049 - loss 0.17502829 - samples/sec: 9.66 - lr: 0.000004
|
687 |
+
2021-02-20 15:35:37,145 ----------------------------------------------------------------------------------------------------
|
688 |
+
2021-02-20 15:35:37,146 EPOCH 6 done: loss 0.1752 - lr 0.0000040
|
689 |
+
2021-02-20 15:37:42,922 TEST : loss 0.11003755778074265 - score 0.9372
|
690 |
+
2021-02-20 15:37:42,957 BAD EPOCHS (no improvement): 4
|
691 |
+
2021-02-20 15:37:42,959 ----------------------------------------------------------------------------------------------------
|
692 |
+
2021-02-20 15:41:11,469 epoch 7 - iter 504/5049 - loss 0.15970022 - samples/sec: 9.67 - lr: 0.000004
|
693 |
+
2021-02-20 15:44:38,687 epoch 7 - iter 1008/5049 - loss 0.16257612 - samples/sec: 9.73 - lr: 0.000004
|
694 |
+
2021-02-20 15:48:07,772 epoch 7 - iter 1512/5049 - loss 0.15637818 - samples/sec: 9.64 - lr: 0.000004
|
695 |
+
2021-02-20 15:51:34,834 epoch 7 - iter 2016/5049 - loss 0.15584222 - samples/sec: 9.74 - lr: 0.000004
|
696 |
+
2021-02-20 15:55:02,825 epoch 7 - iter 2520/5049 - loss 0.15669211 - samples/sec: 9.69 - lr: 0.000004
|
697 |
+
2021-02-20 15:58:30,698 epoch 7 - iter 3024/5049 - loss 0.15856211 - samples/sec: 9.70 - lr: 0.000004
|
698 |
+
2021-02-20 16:01:58,633 epoch 7 - iter 3528/5049 - loss 0.15671081 - samples/sec: 9.70 - lr: 0.000004
|
699 |
+
2021-02-20 16:05:28,295 epoch 7 - iter 4032/5049 - loss 0.15648069 - samples/sec: 9.62 - lr: 0.000004
|
700 |
+
2021-02-20 16:08:56,407 epoch 7 - iter 4536/5049 - loss 0.16071403 - samples/sec: 9.69 - lr: 0.000004
|
701 |
+
2021-02-20 16:12:23,980 epoch 7 - iter 5040/5049 - loss 0.15912073 - samples/sec: 9.71 - lr: 0.000004
|
702 |
+
2021-02-20 16:12:27,258 ----------------------------------------------------------------------------------------------------
|
703 |
+
2021-02-20 16:12:27,258 EPOCH 7 done: loss 0.1593 - lr 0.0000036
|
704 |
+
2021-02-20 16:14:31,752 TEST : loss 0.1253410428762436 - score 0.9331
|
705 |
+
2021-02-20 16:14:31,787 BAD EPOCHS (no improvement): 4
|
706 |
+
2021-02-20 16:14:31,791 ----------------------------------------------------------------------------------------------------
|
707 |
+
2021-02-20 16:18:01,243 epoch 8 - iter 504/5049 - loss 0.14515327 - samples/sec: 9.63 - lr: 0.000004
|
708 |
+
2021-02-20 16:21:29,154 epoch 8 - iter 1008/5049 - loss 0.15844524 - samples/sec: 9.70 - lr: 0.000004
|
709 |
+
2021-02-20 16:24:57,953 epoch 8 - iter 1512/5049 - loss 0.15855560 - samples/sec: 9.66 - lr: 0.000004
|
710 |
+
2021-02-20 16:28:25,738 epoch 8 - iter 2016/5049 - loss 0.15470104 - samples/sec: 9.70 - lr: 0.000003
|
711 |
+
2021-02-20 16:31:54,212 epoch 8 - iter 2520/5049 - loss 0.15710933 - samples/sec: 9.67 - lr: 0.000003
|
712 |
+
2021-02-20 16:35:23,560 epoch 8 - iter 3024/5049 - loss 0.15654992 - samples/sec: 9.63 - lr: 0.000003
|
713 |
+
2021-02-20 16:38:51,123 epoch 8 - iter 3528/5049 - loss 0.15659144 - samples/sec: 9.71 - lr: 0.000003
|
714 |
+
2021-02-20 16:42:19,109 epoch 8 - iter 4032/5049 - loss 0.15848049 - samples/sec: 9.69 - lr: 0.000003
|
715 |
+
2021-02-20 16:45:47,760 epoch 8 - iter 4536/5049 - loss 0.15995362 - samples/sec: 9.66 - lr: 0.000003
|
716 |
+
2021-02-20 16:49:16,138 epoch 8 - iter 5040/5049 - loss 0.16040715 - samples/sec: 9.68 - lr: 0.000003
|
717 |
+
2021-02-20 16:49:19,652 ----------------------------------------------------------------------------------------------------
|
718 |
+
2021-02-20 16:49:19,652 EPOCH 8 done: loss 0.1605 - lr 0.0000033
|
719 |
+
2021-02-20 16:51:24,065 TEST : loss 0.12391051650047302 - score 0.9411
|
720 |
+
2021-02-20 16:51:24,100 BAD EPOCHS (no improvement): 4
|
721 |
+
2021-02-20 16:51:24,104 ----------------------------------------------------------------------------------------------------
|
722 |
+
2021-02-20 16:54:50,947 epoch 9 - iter 504/5049 - loss 0.14319218 - samples/sec: 9.75 - lr: 0.000003
|
723 |
+
2021-02-20 16:58:17,610 epoch 9 - iter 1008/5049 - loss 0.14626190 - samples/sec: 9.76 - lr: 0.000003
|
724 |
+
2021-02-20 17:01:45,887 epoch 9 - iter 1512/5049 - loss 0.14569758 - samples/sec: 9.68 - lr: 0.000003
|
725 |
+
2021-02-20 17:05:13,774 epoch 9 - iter 2016/5049 - loss 0.15481491 - samples/sec: 9.70 - lr: 0.000003
|
726 |
+
2021-02-20 17:08:40,875 epoch 9 - iter 2520/5049 - loss 0.15113900 - samples/sec: 9.74 - lr: 0.000003
|
727 |
+
2021-02-20 17:12:07,457 epoch 9 - iter 3024/5049 - loss 0.15237128 - samples/sec: 9.76 - lr: 0.000003
|
728 |
+
2021-02-20 17:15:34,821 epoch 9 - iter 3528/5049 - loss 0.15264122 - samples/sec: 9.72 - lr: 0.000003
|
729 |
+
2021-02-20 17:19:02,407 epoch 9 - iter 4032/5049 - loss 0.15553964 - samples/sec: 9.71 - lr: 0.000003
|
730 |
+
2021-02-20 17:22:30,994 epoch 9 - iter 4536/5049 - loss 0.15608309 - samples/sec: 9.67 - lr: 0.000003
|
731 |
+
2021-02-20 17:25:57,168 epoch 9 - iter 5040/5049 - loss 0.15908414 - samples/sec: 9.78 - lr: 0.000003
|
732 |
+
2021-02-20 17:26:00,585 ----------------------------------------------------------------------------------------------------
|
733 |
+
2021-02-20 17:26:00,585 EPOCH 9 done: loss 0.1590 - lr 0.0000029
|
734 |
+
2021-02-20 17:28:05,552 TEST : loss 0.1257738471031189 - score 0.9406
|
735 |
+
2021-02-20 17:28:05,583 BAD EPOCHS (no improvement): 4
|
736 |
+
2021-02-20 17:28:05,587 ----------------------------------------------------------------------------------------------------
|
737 |
+
2021-02-20 17:31:34,037 epoch 10 - iter 504/5049 - loss 0.16538340 - samples/sec: 9.67 - lr: 0.000003
|
738 |
+
2021-02-20 17:35:01,686 epoch 10 - iter 1008/5049 - loss 0.16480578 - samples/sec: 9.71 - lr: 0.000003
|
739 |
+
2021-02-20 17:38:30,133 epoch 10 - iter 1512/5049 - loss 0.15934007 - samples/sec: 9.67 - lr: 0.000003
|
740 |
+
2021-02-20 17:41:57,567 epoch 10 - iter 2016/5049 - loss 0.15438570 - samples/sec: 9.72 - lr: 0.000003
|
741 |
+
2021-02-20 17:45:26,625 epoch 10 - iter 2520/5049 - loss 0.14967620 - samples/sec: 9.64 - lr: 0.000003
|
742 |
+
2021-02-20 17:48:54,021 epoch 10 - iter 3024/5049 - loss 0.14847286 - samples/sec: 9.72 - lr: 0.000003
|
743 |
+
2021-02-20 17:52:21,779 epoch 10 - iter 3528/5049 - loss 0.15086106 - samples/sec: 9.70 - lr: 0.000003
|
744 |
+
2021-02-20 17:55:47,985 epoch 10 - iter 4032/5049 - loss 0.14921308 - samples/sec: 9.78 - lr: 0.000003
|
745 |
+
2021-02-20 17:59:16,097 epoch 10 - iter 4536/5049 - loss 0.15006289 - samples/sec: 9.69 - lr: 0.000003
|
746 |
+
2021-02-20 18:02:43,316 epoch 10 - iter 5040/5049 - loss 0.14961823 - samples/sec: 9.73 - lr: 0.000003
|
747 |
+
2021-02-20 18:02:46,866 ----------------------------------------------------------------------------------------------------
|
748 |
+
2021-02-20 18:02:46,866 EPOCH 10 done: loss 0.1496 - lr 0.0000025
|
749 |
+
2021-02-20 18:04:53,002 TEST : loss 0.1382586508989334 - score 0.9355
|
750 |
+
2021-02-20 18:04:53,034 BAD EPOCHS (no improvement): 4
|
751 |
+
2021-02-20 18:04:53,040 ----------------------------------------------------------------------------------------------------
|
752 |
+
2021-02-20 18:08:21,528 epoch 11 - iter 504/5049 - loss 0.15655231 - samples/sec: 9.67 - lr: 0.000002
|
753 |
+
2021-02-20 18:11:49,866 epoch 11 - iter 1008/5049 - loss 0.15351701 - samples/sec: 9.68 - lr: 0.000002
|
754 |
+
2021-02-20 18:15:15,360 epoch 11 - iter 1512/5049 - loss 0.16074115 - samples/sec: 9.81 - lr: 0.000002
|
755 |
+
2021-02-20 18:18:41,580 epoch 11 - iter 2016/5049 - loss 0.15942462 - samples/sec: 9.78 - lr: 0.000002
|
756 |
+
2021-02-20 18:22:09,414 epoch 11 - iter 2520/5049 - loss 0.15244022 - samples/sec: 9.70 - lr: 0.000002
|
757 |
+
2021-02-20 18:25:37,073 epoch 11 - iter 3024/5049 - loss 0.15098374 - samples/sec: 9.71 - lr: 0.000002
|
758 |
+
2021-02-20 18:29:04,540 epoch 11 - iter 3528/5049 - loss 0.14850464 - samples/sec: 9.72 - lr: 0.000002
|
759 |
+
2021-02-20 18:32:31,548 epoch 11 - iter 4032/5049 - loss 0.14682730 - samples/sec: 9.74 - lr: 0.000002
|
760 |
+
2021-02-20 18:35:57,985 epoch 11 - iter 4536/5049 - loss 0.14759185 - samples/sec: 9.77 - lr: 0.000002
|
761 |
+
2021-02-20 18:39:25,816 epoch 11 - iter 5040/5049 - loss 0.14698340 - samples/sec: 9.70 - lr: 0.000002
|
762 |
+
2021-02-20 18:39:29,260 ----------------------------------------------------------------------------------------------------
|
763 |
+
2021-02-20 18:39:29,260 EPOCH 11 done: loss 0.1468 - lr 0.0000021
|
764 |
+
2021-02-20 18:41:33,245 TEST : loss 0.13701947033405304 - score 0.9387
|
765 |
+
2021-02-20 18:41:33,275 BAD EPOCHS (no improvement): 4
|
766 |
+
2021-02-20 18:41:33,280 ----------------------------------------------------------------------------------------------------
|
767 |
+
2021-02-20 18:45:02,899 epoch 12 - iter 504/5049 - loss 0.14915151 - samples/sec: 9.62 - lr: 0.000002
|
768 |
+
2021-02-20 18:48:30,072 epoch 12 - iter 1008/5049 - loss 0.13316084 - samples/sec: 9.73 - lr: 0.000002
|
769 |
+
2021-02-20 18:51:53,567 epoch 12 - iter 1512/5049 - loss 0.13759726 - samples/sec: 9.91 - lr: 0.000002
|
770 |
+
2021-02-20 18:55:21,958 epoch 12 - iter 2016/5049 - loss 0.14573488 - samples/sec: 9.68 - lr: 0.000002
|
771 |
+
2021-02-20 18:58:50,123 epoch 12 - iter 2520/5049 - loss 0.14529516 - samples/sec: 9.69 - lr: 0.000002
|
772 |
+
2021-02-20 19:02:16,173 epoch 12 - iter 3024/5049 - loss 0.14807294 - samples/sec: 9.78 - lr: 0.000002
|
773 |
+
2021-02-20 19:05:43,697 epoch 12 - iter 3528/5049 - loss 0.15232340 - samples/sec: 9.72 - lr: 0.000002
|
774 |
+
2021-02-20 19:09:08,910 epoch 12 - iter 4032/5049 - loss 0.15379466 - samples/sec: 9.82 - lr: 0.000002
|
775 |
+
2021-02-20 19:12:36,683 epoch 12 - iter 4536/5049 - loss 0.15073956 - samples/sec: 9.70 - lr: 0.000002
|
776 |
+
2021-02-20 19:16:04,449 epoch 12 - iter 5040/5049 - loss 0.15045583 - samples/sec: 9.70 - lr: 0.000002
|
777 |
+
2021-02-20 19:16:08,082 ----------------------------------------------------------------------------------------------------
|
778 |
+
2021-02-20 19:16:08,082 EPOCH 12 done: loss 0.1504 - lr 0.0000017
|
779 |
+
2021-02-20 19:18:12,918 TEST : loss 0.15095502138137817 - score 0.9388
|
780 |
+
2021-02-20 19:18:12,953 BAD EPOCHS (no improvement): 4
|
781 |
+
2021-02-20 19:18:12,959 ----------------------------------------------------------------------------------------------------
|
782 |
+
2021-02-20 19:21:40,048 epoch 13 - iter 504/5049 - loss 0.12902688 - samples/sec: 9.74 - lr: 0.000002
|
783 |
+
2021-02-20 19:25:08,962 epoch 13 - iter 1008/5049 - loss 0.13949844 - samples/sec: 9.65 - lr: 0.000002
|
784 |
+
2021-02-20 19:28:34,327 epoch 13 - iter 1512/5049 - loss 0.14321999 - samples/sec: 9.82 - lr: 0.000002
|
785 |
+
2021-02-20 19:32:01,449 epoch 13 - iter 2016/5049 - loss 0.14469366 - samples/sec: 9.73 - lr: 0.000002
|
786 |
+
2021-02-20 19:35:30,176 epoch 13 - iter 2520/5049 - loss 0.14233070 - samples/sec: 9.66 - lr: 0.000002
|
787 |
+
2021-02-20 19:38:58,641 epoch 13 - iter 3024/5049 - loss 0.14131748 - samples/sec: 9.67 - lr: 0.000002
|
788 |
+
2021-02-20 19:42:27,447 epoch 13 - iter 3528/5049 - loss 0.14047840 - samples/sec: 9.66 - lr: 0.000001
|
789 |
+
2021-02-20 19:45:52,955 epoch 13 - iter 4032/5049 - loss 0.14627085 - samples/sec: 9.81 - lr: 0.000001
|
790 |
+
2021-02-20 19:49:18,859 epoch 13 - iter 4536/5049 - loss 0.14438495 - samples/sec: 9.79 - lr: 0.000001
|
791 |
+
2021-02-20 19:52:48,483 epoch 13 - iter 5040/5049 - loss 0.14466525 - samples/sec: 9.62 - lr: 0.000001
|
792 |
+
2021-02-20 19:52:51,977 ----------------------------------------------------------------------------------------------------
|
793 |
+
2021-02-20 19:52:51,977 EPOCH 13 done: loss 0.1446 - lr 0.0000014
|
794 |
+
2021-02-20 19:54:57,358 TEST : loss 0.14258751273155212 - score 0.9399
|
795 |
+
2021-02-20 19:54:57,388 BAD EPOCHS (no improvement): 4
|
796 |
+
2021-02-20 19:54:57,392 ----------------------------------------------------------------------------------------------------
|
797 |
+
2021-02-20 19:58:27,192 epoch 14 - iter 504/5049 - loss 0.15244849 - samples/sec: 9.61 - lr: 0.000001
|
798 |
+
2021-02-20 20:01:54,054 epoch 14 - iter 1008/5049 - loss 0.15439315 - samples/sec: 9.75 - lr: 0.000001
|
799 |
+
2021-02-20 20:05:20,574 epoch 14 - iter 1512/5049 - loss 0.15336394 - samples/sec: 9.76 - lr: 0.000001
|
800 |
+
2021-02-20 20:08:47,946 epoch 14 - iter 2016/5049 - loss 0.15177470 - samples/sec: 9.72 - lr: 0.000001
|
801 |
+
2021-02-20 20:12:16,402 epoch 14 - iter 2520/5049 - loss 0.14492786 - samples/sec: 9.67 - lr: 0.000001
|
802 |
+
2021-02-20 20:15:44,769 epoch 14 - iter 3024/5049 - loss 0.14722528 - samples/sec: 9.68 - lr: 0.000001
|
803 |
+
2021-02-20 20:19:11,969 epoch 14 - iter 3528/5049 - loss 0.14537507 - samples/sec: 9.73 - lr: 0.000001
|
804 |
+
2021-02-20 20:22:40,528 epoch 14 - iter 4032/5049 - loss 0.14247368 - samples/sec: 9.67 - lr: 0.000001
|
805 |
+
2021-02-20 20:26:06,304 epoch 14 - iter 4536/5049 - loss 0.14233014 - samples/sec: 9.80 - lr: 0.000001
|
806 |
+
2021-02-20 20:29:35,214 epoch 14 - iter 5040/5049 - loss 0.14141983 - samples/sec: 9.65 - lr: 0.000001
|
807 |
+
2021-02-20 20:29:38,745 ----------------------------------------------------------------------------------------------------
|
808 |
+
2021-02-20 20:29:38,746 EPOCH 14 done: loss 0.1415 - lr 0.0000010
|
809 |
+
2021-02-20 20:31:42,742 TEST : loss 0.16016331315040588 - score 0.9396
|
810 |
+
2021-02-20 20:31:42,776 BAD EPOCHS (no improvement): 4
|
811 |
+
2021-02-20 20:31:42,874 ----------------------------------------------------------------------------------------------------
|
812 |
+
2021-02-20 20:35:10,584 epoch 15 - iter 504/5049 - loss 0.16948716 - samples/sec: 9.71 - lr: 0.000001
|
813 |
+
2021-02-20 20:38:38,789 epoch 15 - iter 1008/5049 - loss 0.16114678 - samples/sec: 9.68 - lr: 0.000001
|
814 |
+
2021-02-20 20:42:08,608 epoch 15 - iter 1512/5049 - loss 0.15736098 - samples/sec: 9.61 - lr: 0.000001
|
815 |
+
2021-02-20 20:45:37,135 epoch 15 - iter 2016/5049 - loss 0.15347995 - samples/sec: 9.67 - lr: 0.000001
|
816 |
+
2021-02-20 20:49:06,383 epoch 15 - iter 2520/5049 - loss 0.15053243 - samples/sec: 9.64 - lr: 0.000001
|
817 |
+
2021-02-20 20:52:34,741 epoch 15 - iter 3024/5049 - loss 0.15367094 - samples/sec: 9.68 - lr: 0.000001
|
818 |
+
2021-02-20 20:56:02,251 epoch 15 - iter 3528/5049 - loss 0.15097795 - samples/sec: 9.72 - lr: 0.000001
|
819 |
+
2021-02-20 20:59:27,407 epoch 15 - iter 4032/5049 - loss 0.14762646 - samples/sec: 9.83 - lr: 0.000001
|
820 |
+
2021-02-20 21:02:55,468 epoch 15 - iter 4536/5049 - loss 0.14764760 - samples/sec: 9.69 - lr: 0.000001
|
821 |
+
2021-02-20 21:06:24,604 epoch 15 - iter 5040/5049 - loss 0.14664106 - samples/sec: 9.64 - lr: 0.000001
|
822 |
+
2021-02-20 21:06:28,160 ----------------------------------------------------------------------------------------------------
|
823 |
+
2021-02-20 21:06:28,160 EPOCH 15 done: loss 0.1468 - lr 0.0000007
|
824 |
+
2021-02-20 21:08:34,321 TEST : loss 0.15258659422397614 - score 0.94
|
825 |
+
2021-02-20 21:08:34,353 BAD EPOCHS (no improvement): 4
|
826 |
+
2021-02-20 21:08:34,355 ----------------------------------------------------------------------------------------------------
|
827 |
+
2021-02-20 21:12:02,633 epoch 16 - iter 504/5049 - loss 0.14775549 - samples/sec: 9.68 - lr: 0.000001
|
828 |
+
2021-02-20 21:15:29,663 epoch 16 - iter 1008/5049 - loss 0.15171173 - samples/sec: 9.74 - lr: 0.000001
|
829 |
+
2021-02-20 21:18:57,081 epoch 16 - iter 1512/5049 - loss 0.15467193 - samples/sec: 9.72 - lr: 0.000001
|
830 |
+
2021-02-20 21:22:22,530 epoch 16 - iter 2016/5049 - loss 0.15499647 - samples/sec: 9.81 - lr: 0.000001
|
831 |
+
2021-02-20 21:25:49,850 epoch 16 - iter 2520/5049 - loss 0.15723807 - samples/sec: 9.73 - lr: 0.000001
|
832 |
+
2021-02-20 21:29:15,774 epoch 16 - iter 3024/5049 - loss 0.15353327 - samples/sec: 9.79 - lr: 0.000001
|
833 |
+
2021-02-20 21:32:44,337 epoch 16 - iter 3528/5049 - loss 0.15530051 - samples/sec: 9.67 - lr: 0.000001
|
834 |
+
2021-02-20 21:36:13,762 epoch 16 - iter 4032/5049 - loss 0.15354102 - samples/sec: 9.63 - lr: 0.000001
|
835 |
+
2021-02-20 21:39:40,865 epoch 16 - iter 4536/5049 - loss 0.15328424 - samples/sec: 9.74 - lr: 0.000001
|
836 |
+
2021-02-20 21:43:07,866 epoch 16 - iter 5040/5049 - loss 0.15234921 - samples/sec: 9.74 - lr: 0.000000
|
837 |
+
2021-02-20 21:43:11,383 ----------------------------------------------------------------------------------------------------
|
838 |
+
2021-02-20 21:43:11,383 EPOCH 16 done: loss 0.1522 - lr 0.0000005
|
839 |
+
2021-02-20 21:45:16,386 TEST : loss 0.14317740499973297 - score 0.9434
|
840 |
+
2021-02-20 21:45:16,421 BAD EPOCHS (no improvement): 4
|
841 |
+
2021-02-20 21:45:16,435 ----------------------------------------------------------------------------------------------------
|
842 |
+
2021-02-20 21:48:44,324 epoch 17 - iter 504/5049 - loss 0.17996491 - samples/sec: 9.70 - lr: 0.000000
|
843 |
+
2021-02-20 21:52:11,485 epoch 17 - iter 1008/5049 - loss 0.15543252 - samples/sec: 9.73 - lr: 0.000000
|
844 |
+
2021-02-20 21:55:39,073 epoch 17 - iter 1512/5049 - loss 0.15122585 - samples/sec: 9.71 - lr: 0.000000
|
845 |
+
2021-02-20 21:59:05,347 epoch 17 - iter 2016/5049 - loss 0.14783825 - samples/sec: 9.77 - lr: 0.000000
|
846 |
+
2021-02-20 22:02:33,153 epoch 17 - iter 2520/5049 - loss 0.14858434 - samples/sec: 9.70 - lr: 0.000000
|
847 |
+
2021-02-20 22:06:00,594 epoch 17 - iter 3024/5049 - loss 0.14719342 - samples/sec: 9.72 - lr: 0.000000
|
848 |
+
2021-02-20 22:09:28,634 epoch 17 - iter 3528/5049 - loss 0.14664091 - samples/sec: 9.69 - lr: 0.000000
|
849 |
+
2021-02-20 22:12:55,588 epoch 17 - iter 4032/5049 - loss 0.14789258 - samples/sec: 9.74 - lr: 0.000000
|
850 |
+
2021-02-20 22:16:23,015 epoch 17 - iter 4536/5049 - loss 0.14772011 - samples/sec: 9.72 - lr: 0.000000
|
851 |
+
2021-02-20 22:19:48,689 epoch 17 - iter 5040/5049 - loss 0.14601221 - samples/sec: 9.80 - lr: 0.000000
|
852 |
+
2021-02-20 22:19:52,053 ----------------------------------------------------------------------------------------------------
|
853 |
+
2021-02-20 22:19:52,053 EPOCH 17 done: loss 0.1459 - lr 0.0000003
|
854 |
+
2021-02-20 22:21:56,595 TEST : loss 0.14864514768123627 - score 0.9435
|
855 |
+
2021-02-20 22:21:56,631 BAD EPOCHS (no improvement): 4
|
856 |
+
2021-02-20 22:21:56,633 ----------------------------------------------------------------------------------------------------
|
857 |
+
2021-02-20 22:25:22,139 epoch 18 - iter 504/5049 - loss 0.13554364 - samples/sec: 9.81 - lr: 0.000000
|
858 |
+
2021-02-20 22:28:49,994 epoch 18 - iter 1008/5049 - loss 0.14305913 - samples/sec: 9.70 - lr: 0.000000
|
859 |
+
2021-02-20 22:32:15,601 epoch 18 - iter 1512/5049 - loss 0.13788820 - samples/sec: 9.81 - lr: 0.000000
|
860 |
+
2021-02-20 22:35:43,508 epoch 18 - iter 2016/5049 - loss 0.13837578 - samples/sec: 9.70 - lr: 0.000000
|
861 |
+
2021-02-20 22:39:11,318 epoch 18 - iter 2520/5049 - loss 0.14012105 - samples/sec: 9.70 - lr: 0.000000
|
862 |
+
2021-02-20 22:42:39,481 epoch 18 - iter 3024/5049 - loss 0.13876418 - samples/sec: 9.69 - lr: 0.000000
|
863 |
+
2021-02-20 22:46:07,677 epoch 18 - iter 3528/5049 - loss 0.13934073 - samples/sec: 9.68 - lr: 0.000000
|
864 |
+
2021-02-20 22:49:36,353 epoch 18 - iter 4032/5049 - loss 0.14036170 - samples/sec: 9.66 - lr: 0.000000
|
865 |
+
2021-02-20 22:53:02,472 epoch 18 - iter 4536/5049 - loss 0.13826052 - samples/sec: 9.78 - lr: 0.000000
|
866 |
+
2021-02-20 22:56:29,133 epoch 18 - iter 5040/5049 - loss 0.13982791 - samples/sec: 9.76 - lr: 0.000000
|
867 |
+
2021-02-20 22:56:32,612 ----------------------------------------------------------------------------------------------------
|
868 |
+
2021-02-20 22:56:32,613 EPOCH 18 done: loss 0.1397 - lr 0.0000001
|
869 |
+
2021-02-20 22:58:37,314 TEST : loss 0.1518455296754837 - score 0.9437
|
870 |
+
2021-02-20 22:58:37,347 BAD EPOCHS (no improvement): 4
|
871 |
+
2021-02-20 22:58:37,349 ----------------------------------------------------------------------------------------------------
|
872 |
+
2021-02-20 23:02:03,828 epoch 19 - iter 504/5049 - loss 0.13900759 - samples/sec: 9.76 - lr: 0.000000
|
873 |
+
2021-02-20 23:05:30,296 epoch 19 - iter 1008/5049 - loss 0.14452024 - samples/sec: 9.77 - lr: 0.000000
|
874 |
+
2021-02-20 23:08:57,447 epoch 19 - iter 1512/5049 - loss 0.14064833 - samples/sec: 9.73 - lr: 0.000000
|
875 |
+
2021-02-20 23:12:23,953 epoch 19 - iter 2016/5049 - loss 0.13464772 - samples/sec: 9.76 - lr: 0.000000
|
876 |
+
2021-02-20 23:15:51,459 epoch 19 - iter 2520/5049 - loss 0.13777886 - samples/sec: 9.72 - lr: 0.000000
|
877 |
+
2021-02-20 23:19:17,489 epoch 19 - iter 3024/5049 - loss 0.13952515 - samples/sec: 9.79 - lr: 0.000000
|
878 |
+
2021-02-20 23:22:45,967 epoch 19 - iter 3528/5049 - loss 0.14131733 - samples/sec: 9.67 - lr: 0.000000
|
879 |
+
2021-02-20 23:26:13,407 epoch 19 - iter 4032/5049 - loss 0.13939496 - samples/sec: 9.72 - lr: 0.000000
|
880 |
+
2021-02-20 23:29:44,085 epoch 19 - iter 4536/5049 - loss 0.13930015 - samples/sec: 9.57 - lr: 0.000000
|
881 |
+
2021-02-20 23:33:12,190 epoch 19 - iter 5040/5049 - loss 0.14268221 - samples/sec: 9.69 - lr: 0.000000
|
882 |
+
2021-02-20 23:33:15,754 ----------------------------------------------------------------------------------------------------
|
883 |
+
2021-02-20 23:33:15,754 EPOCH 19 done: loss 0.1425 - lr 0.0000000
|
884 |
+
2021-02-20 23:35:20,374 TEST : loss 0.15170469880104065 - score 0.9433
|
885 |
+
2021-02-20 23:35:20,405 BAD EPOCHS (no improvement): 4
|
886 |
+
2021-02-20 23:35:20,408 ----------------------------------------------------------------------------------------------------
|
887 |
+
2021-02-20 23:38:48,797 epoch 20 - iter 504/5049 - loss 0.11983740 - samples/sec: 9.68 - lr: 0.000000
|
888 |
+
2021-02-20 23:42:16,401 epoch 20 - iter 1008/5049 - loss 0.12881478 - samples/sec: 9.71 - lr: 0.000000
|
889 |
+
2021-02-20 23:45:42,588 epoch 20 - iter 1512/5049 - loss 0.13435941 - samples/sec: 9.78 - lr: 0.000000
|
890 |
+
2021-02-20 23:49:09,566 epoch 20 - iter 2016/5049 - loss 0.13495553 - samples/sec: 9.74 - lr: 0.000000
|
891 |
+
2021-02-20 23:52:36,896 epoch 20 - iter 2520/5049 - loss 0.13517442 - samples/sec: 9.72 - lr: 0.000000
|
892 |
+
2021-02-20 23:56:06,234 epoch 20 - iter 3024/5049 - loss 0.13889997 - samples/sec: 9.63 - lr: 0.000000
|
893 |
+
2021-02-20 23:59:35,831 epoch 20 - iter 3528/5049 - loss 0.13720651 - samples/sec: 9.62 - lr: 0.000000
|
894 |
+
2021-02-21 00:03:03,594 epoch 20 - iter 4032/5049 - loss 0.13855230 - samples/sec: 9.70 - lr: 0.000000
|
895 |
+
2021-02-21 00:06:30,095 epoch 20 - iter 4536/5049 - loss 0.14032340 - samples/sec: 9.76 - lr: 0.000000
|
896 |
+
2021-02-21 00:09:58,484 epoch 20 - iter 5040/5049 - loss 0.13983281 - samples/sec: 9.68 - lr: 0.000000
|
897 |
+
2021-02-21 00:10:02,013 ----------------------------------------------------------------------------------------------------
|
898 |
+
2021-02-21 00:10:02,013 EPOCH 20 done: loss 0.1398 - lr 0.0000000
|
899 |
+
2021-02-21 00:12:06,767 TEST : loss 0.15191785991191864 - score 0.9436
|
900 |
+
2021-02-21 00:12:06,801 BAD EPOCHS (no improvement): 4
|
901 |
+
2021-02-21 00:12:53,129 ----------------------------------------------------------------------------------------------------
|
902 |
+
2021-02-21 00:12:53,129 Testing using best model ...
|
903 |
+
2021-02-21 00:15:03,989 0.9408 0.9465 0.9436
|
904 |
+
2021-02-21 00:15:03,989
|
905 |
+
Results:
|
906 |
+
- F1-score (micro) 0.9436
|
907 |
+
- F1-score (macro) 0.9374
|
908 |
+
|
909 |
+
By class:
|
910 |
+
LOC tp: 1445 - fp: 134 - fn: 69 - precision: 0.9151 - recall: 0.9544 - f1-score: 0.9344
|
911 |
+
MISC tp: 627 - fp: 96 - fn: 51 - precision: 0.8672 - recall: 0.9248 - f1-score: 0.8951
|
912 |
+
ORG tp: 1679 - fp: 98 - fn: 174 - precision: 0.9449 - recall: 0.9061 - f1-score: 0.9251
|
913 |
+
PER tp: 1587 - fp: 8 - fn: 8 - precision: 0.9950 - recall: 0.9950 - f1-score: 0.9950
|
914 |
+
2021-02-21 00:15:03,989 ----------------------------------------------------------------------------------------------------
|
915 |
+
2021-02-21 00:15:03,989 ----------------------------------------------------------------------------------------------------
|