readme: add initial version
Browse files
README.md
ADDED
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
license: mit
|
4 |
+
datasets:
|
5 |
+
- conll2003
|
6 |
+
widget:
|
7 |
+
- text: My name is Clara Clever and I live in Berkeley , California .
|
8 |
+
---
|
9 |
+
|
10 |
+
# T5 Base Model for Named Entity Recognition (NER, CoNLL-2003)
|
11 |
+
|
12 |
+
In this repository, we open source a T5 Base model, that was fine-tuned on the official CoNLL-2003 NER dataset.
|
13 |
+
|
14 |
+
We use the great [TANL library](https://github.com/amazon-research/tanl) from Amazon for fine-tuning the model.
|
15 |
+
|
16 |
+
The exact approach of fine-tuning is presented in the "TANL: Structured Prediction as Translation between Augmented Natural Languages"
|
17 |
+
paper from Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang and Stefano Soatto.
|
18 |
+
|
19 |
+
# Fine-Tuning
|
20 |
+
|
21 |
+
We use the same hyper-parameter settings as used in the official implementation with one minor change. Instead of using 8 V100 GPUs, we train the model
|
22 |
+
on one V100 GPU and used gradient accumulation. The slighly modified configuration file (`config.ini`) then looks like:
|
23 |
+
|
24 |
+
```ini
|
25 |
+
[conll03]
|
26 |
+
datasets = conll03
|
27 |
+
model_name_or_path = t5-base
|
28 |
+
num_train_epochs = 10
|
29 |
+
max_seq_length = 256
|
30 |
+
max_seq_length_eval = 512
|
31 |
+
per_device_train_batch_size = 4
|
32 |
+
per_device_eval_batch_size = 4
|
33 |
+
do_train = True
|
34 |
+
do_eval = True
|
35 |
+
do_predict = True
|
36 |
+
gradient_accumulation_steps = 8
|
37 |
+
```
|
38 |
+
|
39 |
+
It took around 2 hours to fine-tune that model on the 14,041 training sentences of CoNLL-2003 dataset.
|
40 |
+
|
41 |
+
# Evaluation
|
42 |
+
|
43 |
+
On the development set, the following evaluation results could be achieved:
|
44 |
+
|
45 |
+
```json
|
46 |
+
{
|
47 |
+
"entity_precision": 0.9536446086664427,
|
48 |
+
"entity_recall": 0.9555705149781218,
|
49 |
+
"entity_f1": 0.9546065904505716,
|
50 |
+
"entity_precision_no_type": 0.9773261672824992,
|
51 |
+
"entity_recall_no_type": 0.9792998990238977,
|
52 |
+
"entity_f1_no_type": 0.9783120376597176
|
53 |
+
}
|
54 |
+
```
|
55 |
+
|
56 |
+
The evaluation results on the test set looks like:
|
57 |
+
|
58 |
+
```json
|
59 |
+
{
|
60 |
+
"entity_precision": 0.912182296231376,
|
61 |
+
"entity_recall": 0.9213881019830028,
|
62 |
+
"entity_f1": 0.9167620893155995,
|
63 |
+
"entity_precision_no_type": 0.953900087642419,
|
64 |
+
"entity_recall_no_type": 0.9635269121813032,
|
65 |
+
"entity_f1_no_type": 0.9586893332158901
|
66 |
+
}
|
67 |
+
```
|
68 |
+
|
69 |
+
To summarize: On the development set, 95.46% F1-Score and 91.68% on test set were achieved with this model. The paper reported a F1-Score of 91.7%.
|
70 |
+
|
71 |
+
# License
|
72 |
+
|
73 |
+
The models is licensed under [MIT](https://choosealicense.com/licenses/mit/).
|
74 |
+
|
75 |
+
# Acknowledgments
|
76 |
+
|
77 |
+
Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
|
78 |
+
it is possible to download both cased and uncased models from their S3 storage 🤗
|