stefan-it commited on
Commit
60f2a42
1 Parent(s): 41d6ae2

readme: add initial version

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ datasets:
5
+ - conll2003
6
+ widget:
7
+ - text: My name is Clara Clever and I live in Berkeley , California .
8
+ ---
9
+
10
+ # T5 Base Model for Named Entity Recognition (NER, CoNLL-2003)
11
+
12
+ In this repository, we open source a T5 Base model, that was fine-tuned on the official CoNLL-2003 NER dataset.
13
+
14
+ We use the great [TANL library](https://github.com/amazon-research/tanl) from Amazon for fine-tuning the model.
15
+
16
+ The exact approach of fine-tuning is presented in the "TANL: Structured Prediction as Translation between Augmented Natural Languages"
17
+ paper from Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang and Stefano Soatto.
18
+
19
+ # Fine-Tuning
20
+
21
+ We use the same hyper-parameter settings as used in the official implementation with one minor change. Instead of using 8 V100 GPUs, we train the model
22
+ on one V100 GPU and used gradient accumulation. The slighly modified configuration file (`config.ini`) then looks like:
23
+
24
+ ```ini
25
+ [conll03]
26
+ datasets = conll03
27
+ model_name_or_path = t5-base
28
+ num_train_epochs = 10
29
+ max_seq_length = 256
30
+ max_seq_length_eval = 512
31
+ per_device_train_batch_size = 4
32
+ per_device_eval_batch_size = 4
33
+ do_train = True
34
+ do_eval = True
35
+ do_predict = True
36
+ gradient_accumulation_steps = 8
37
+ ```
38
+
39
+ It took around 2 hours to fine-tune that model on the 14,041 training sentences of CoNLL-2003 dataset.
40
+
41
+ # Evaluation
42
+
43
+ On the development set, the following evaluation results could be achieved:
44
+
45
+ ```json
46
+ {
47
+ "entity_precision": 0.9536446086664427,
48
+ "entity_recall": 0.9555705149781218,
49
+ "entity_f1": 0.9546065904505716,
50
+ "entity_precision_no_type": 0.9773261672824992,
51
+ "entity_recall_no_type": 0.9792998990238977,
52
+ "entity_f1_no_type": 0.9783120376597176
53
+ }
54
+ ```
55
+
56
+ The evaluation results on the test set looks like:
57
+
58
+ ```json
59
+ {
60
+ "entity_precision": 0.912182296231376,
61
+ "entity_recall": 0.9213881019830028,
62
+ "entity_f1": 0.9167620893155995,
63
+ "entity_precision_no_type": 0.953900087642419,
64
+ "entity_recall_no_type": 0.9635269121813032,
65
+ "entity_f1_no_type": 0.9586893332158901
66
+ }
67
+ ```
68
+
69
+ To summarize: On the development set, 95.46% F1-Score and 91.68% on test set were achieved with this model. The paper reported a F1-Score of 91.7%.
70
+
71
+ # License
72
+
73
+ The models is licensed under [MIT](https://choosealicense.com/licenses/mit/).
74
+
75
+ # Acknowledgments
76
+
77
+ Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
78
+ it is possible to download both cased and uncased models from their S3 storage 🤗