Babelscape
/

wikineural-multilingual-ner

Token Classification

named-entity-recognition

sequence-tagger-model

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Simone Tedeschi commited on Jan 31, 2022

Commit

9b9168c

•

1 Parent(s): 290a096

Create README.md

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+annotations_creators:
+- machine-generated
+language_creators:
+- machine-generated
+languages:
+- de
+- en
+- es
+- fr
+- it
+- nl
+- pl
+- pt
+- ru
+licenses:
+- cc-by-nc-sa-4.0
+pretty_name: wikineural-dataset
+source_datasets:
+- original
+task_categories:
+- structure-prediction
+task_ids:
+- named-entity-recognition
+---
+## Model Description
+- **Summary:** mBERT model fine-tuned on the recently-introduced WikiNEuRal dataset for Multilingual NER.
+- **Official Repository:** [https://github.com/Babelscape/wikineural](https://github.com/Babelscape/wikineural)
+- **Paper:** [https://aclanthology.org/wikineural](https://aclanthology.org/2021.findings-emnlp.215/)
+## Licensing Information
+Contents of this repository are restricted to only non-commercial research purposes under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright of the dataset contents belongs to the original copyright holders.
+## Citation Information
+```bibtex
+@inproceedings{tedeschi-etal-2021-wikineural-combined,
+    title = "{W}iki{NE}u{R}al: {C}ombined Neural and Knowledge-based Silver Data Creation for Multilingual {NER}",
+    author = "Tedeschi, Simone  and
+      Maiorca, Valentino  and
+      Campolungo, Niccol{\`o}  and
+      Cecconi, Francesco  and
+      Navigli, Roberto",
+    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
+    month = nov,
+    year = "2021",
+    address = "Punta Cana, Dominican Republic",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2021.findings-emnlp.215",
+    pages = "2521--2533",
+    abstract = "Multilingual Named Entity Recognition (NER) is a key intermediate task which is needed in many areas of NLP. In this paper, we address the well-known issue of data scarcity in NER, especially relevant when moving to a multilingual scenario, and go beyond current approaches to the creation of multilingual silver data for the task. We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER. We evaluate our datasets extensively on standard benchmarks for NER, yielding substantial improvements up to 6 span-based F1-score points over previous state-of-the-art systems for data creation.",
+}
+```