Migrate model card from transformers-repo

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/SZTAKI-HLT/hubert-base-cc/README.md

Files changed (1) hide show

README.md +46 -0

README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+language: hu
+license: apache-2.0
+datasets:
+- common_crawl
+- wikipedia
+---
+# huBERT base model (cased)
+## Model description
+Cased BERT model for Hungarian, trained on the (filtered, deduplicated) Hungarian subset of the Common Crawl and a snapshot of the Hungarian Wikipedia.
+## Intended uses & limitations
+The model can be used as any other (cased) BERT model. It has been tested on the chunking and
+named entity recognition tasks and set a new state-of-the-art on the former.
+## Training
+Details of the training data and procedure can be found in the PhD thesis linked below. (With the caveat that it only contains preliminary results
+based on the Wikipedia subcorpus. Evaluation of the full model will appear in a future paper.)
+## Eval results
+When fine-tuned (via `BertForTokenClassification`) on chunking and NER, the model outperforms multilingual BERT, achieves state-of-the-art results on the
+former task and comes within 0.5% F1 to the SotA on the latter. The exact scores are
+| NER | Minimal NP | Maximal NP |
+|-----|------------|------------|
+| 97.62% | **97.14%** | **96.97%** |
+### BibTeX entry and citation info
+The training corpus, parameters and the evaluation methods are discussed in the
+[following PhD thesis](https://hlt.bme.hu/en/publ/nemeskey_2020):
+```bibtex
+@PhDThesis{ Nemeskey:2020,
+  author = {Nemeskey, Dávid Márk},
+  title  = {Natural Language Processing Methods for Language Modeling},
+  year   = {2020},
+  school = {E\"otv\"os Lor\'and University}
+}
+```