Update README.md
Browse files
README.md
CHANGED
@@ -6,3 +6,25 @@ language:
|
|
6 |
widget:
|
7 |
- text: "I cha etz au Schwiizerdütsch. <mask> zäme! 😊"
|
8 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
widget:
|
7 |
- text: "I cha etz au Schwiizerdütsch. <mask> zäme! 😊"
|
8 |
---
|
9 |
+
|
10 |
+
The [**xlm-roberta-base**](https://huggingface.co/xlm-roberta-base) model ([Conneau et al., ACL 2020](https://aclanthology.org/2020.acl-main.747/)) trained on Swiss German text data via continued pre-training.
|
11 |
+
|
12 |
+
## Training Data
|
13 |
+
For continued pre-training, we used the following two datasets of written Swiss German:
|
14 |
+
1. [SwissCrawl](https://icosys.ch/swisscrawl) ([Linder et al., LREC 2020](https://aclanthology.org/2020.lrec-1.329)), a collection of Swiss German web text (forum discussions, social media).
|
15 |
+
2. A custom dataset of Swiss German tweets
|
16 |
+
|
17 |
+
In addition, we trained the model on an equal amount of Standard German data. We used news articles retrieved from [Swissdox@LiRI](https://t.uzh.ch/1hI).
|
18 |
+
|
19 |
+
## License
|
20 |
+
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
|
21 |
+
|
22 |
+
## Citation
|
23 |
+
```bibtex
|
24 |
+
@inproceedings{vamvas-etal-2024-modular,
|
25 |
+
title={Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect},
|
26 |
+
author={Jannis Vamvas and No{\"e}mi Aepli and Rico Sennrich},
|
27 |
+
booktitle={First Workshop on Modular and Open Multilingual NLP},
|
28 |
+
year={2024},
|
29 |
+
}
|
30 |
+
```
|