Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,25 @@
|
|
3 |
|
4 |
# Model Description
|
5 |
|
6 |
-
This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Marefa, and collection of Arabic News. Total Corpora size is 17GB. We restrict our corpora to News and Encyclopedias to enhance the performance of the model on informative tasks such as Factoid Question Answering and Generative task that uses classic Arabic ( الفصحى ). This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686)
|
7 |
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
4 |
# Model Description
|
5 |
|
6 |
+
This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Marefa, Hindawi Books and collection of Arabic News. Total Corpora size is 17GB. We restrict our corpora to News and Encyclopedias to enhance the performance of the model on informative tasks such as Factoid Question Answering and Generative task that uses classic Arabic ( الفصحى ). This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) .
|
7 |
|
8 |
+
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
9 |
+
|
10 |
+
| Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora | TyDi QA EM/F1|
|
11 |
+
|------------------|--------------|-------------|---------------|-------|-----------|---------------|--------|-----------------------|------------------------|--------------|
|
12 |
+
| AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) | 69.16/82.82 |
|
13 |
+
| AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) | 68.51/82.66 |
|
14 |
+
| AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) | 64.39/78.22 |
|
15 |
+
| mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)| 71.55/83.78 |
|
16 |
+
| **ArabicT5-Base** | **512** | **8** | **20** | **32K** |**TPUv3-32** | **256K** | **256** | **0.5x** |**17GB (MSA)** | **72.75/85.49** |
|
17 |
+
| ArabicT5-Large | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) | --/-- |
|
18 |
+
| ArabicT5-xLarge | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) | --/-- |
|
19 |
+
|
20 |
+
|
21 |
+
|
22 |
+
# Acknowledgment
|
23 |
+
|
24 |
+
We would like to acknowledge the support we have from Tensorflow Research Cloud (TFRC) team to grant us access to TPUv3 units.
|
25 |
+
|
26 |
+
# Citation
|
27 |
+
Paper will be shared soon
|