Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ license: cc-by-nc-4.0
|
|
9 |
---
|
10 |
|
11 |
|
12 |
-
**EgyBERT** is a large language model focused exclusively on
|
13 |
|
14 |
|
15 |
|
|
|
9 |
---
|
10 |
|
11 |
|
12 |
+
**EgyBERT** is a large language model focused exclusively on Egyptian dialectal texts. The model was pretrained on two large-scale corpora: the Egyptian Tweets Corpus (ETC), which contains +34 million tweets, and the Egyptian Forum Corpus, which includes +44 million sentences collected from various online forums. The datasets comprise **10.4GB of text**. The code files along with the results are available on [repo](https://github.com/FaisalQarah/EgyBERT).
|
13 |
|
14 |
|
15 |
|