dbmdz
/

t5-base-conll03-english

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

stefan-it commited on Jan 12, 2022

Commit

60f2a42

•

1 Parent(s): 41d6ae2

readme: add initial version

Files changed (1) hide show

README.md +78 -0

README.md ADDED Viewed

	@@ -0,0 +1,78 @@

+---
+language: en
+license: mit
+datasets:
+- conll2003
+widget:
+- text: My name is Clara Clever and I live in Berkeley , California .
+---
+# T5 Base Model for Named Entity Recognition (NER, CoNLL-2003)
+In this repository, we open source a T5 Base model, that was fine-tuned on the official CoNLL-2003 NER dataset.
+We use the great [TANL library](https://github.com/amazon-research/tanl) from Amazon for fine-tuning the model.
+The exact approach of fine-tuning is presented in the "TANL: Structured Prediction as Translation between Augmented Natural Languages"
+paper from Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang and Stefano Soatto.
+# Fine-Tuning
+We use the same hyper-parameter settings as used in the official implementation with one minor change. Instead of using 8 V100 GPUs, we train the model
+on one V100 GPU and used gradient accumulation. The slighly modified configuration file (`config.ini`) then looks like:
+```ini
+[conll03]
+datasets = conll03
+model_name_or_path = t5-base
+num_train_epochs = 10
+max_seq_length = 256
+max_seq_length_eval = 512
+per_device_train_batch_size = 4
+per_device_eval_batch_size = 4
+do_train = True
+do_eval = True
+do_predict = True
+gradient_accumulation_steps = 8
+```
+It took around 2 hours to fine-tune that model on the 14,041 training sentences of CoNLL-2003 dataset.
+# Evaluation
+On the development set, the following evaluation results could be achieved:
+```json
+{
+"entity_precision": 0.9536446086664427,
+"entity_recall": 0.9555705149781218,
+"entity_f1": 0.9546065904505716,
+"entity_precision_no_type": 0.9773261672824992,
+"entity_recall_no_type": 0.9792998990238977,
+"entity_f1_no_type": 0.9783120376597176
+}
+```
+The evaluation results on the test set looks like:
+```json
+{
+"entity_precision": 0.912182296231376,
+"entity_recall": 0.9213881019830028,
+"entity_f1": 0.9167620893155995,
+"entity_precision_no_type": 0.953900087642419,
+"entity_recall_no_type": 0.9635269121813032,
+"entity_f1_no_type": 0.9586893332158901
+}
+```
+To summarize: On the development set, 95.46% F1-Score and 91.68% on test set were achieved with this model. The paper reported a F1-Score of 91.7%.
+# License
+The models is licensed under [MIT](https://choosealicense.com/licenses/mit/).
+# Acknowledgments
+Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
+it is possible to download both cased and uncased models from their S3 storage 🤗