File size: 1,310 Bytes
b74da4f b9a4fd5 b74da4f cf8c0ad b74da4f cf8c0ad b74da4f cf8c0ad ad0a307 cf8c0ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
language: da
widget:
- text: En trend, der kan blive ligeså hot som<mask>.
tags:
- roberta
- danish
- masked-lm
- pytorch
license: agpl-3.0
---
# DanskBERT
This is DanskBERT, a Danish language model. Note that you should not prepend the mask with a space when using it directly!
The model is the best performing base-size model on the [ScandEval benchmark for Danish](https://scandeval.github.io/nlu-benchmark/).
DanskBERT was trained on the Danish Gigaword Corpus (Strømberg-Derczynski et al., 2021).
DanskBERT was trained using fairseq using the RoBERTa-base configuration. The model was trained with a batch size of 2k, and was trained to convergence for 500k steps using 16 V100 cards for approximately two weeks.
If you find this model useful, please cite
```
@inproceedings{snaebjarnarson-etal-2023-transfer,
title = "{T}ransfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese",
author = "Snæbjarnarson, Vésteinn and
Simonsen, Annika and
Glavaš, Goran and
Vulić, Ivan",
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
month = "may 22--24",
year = "2023",
address = "Tórshavn, Faroe Islands",
publisher = {Link{\"o}ping University Electronic Press, Sweden},
}
``` |