sultan commited on
Commit
083d9e9
1 Parent(s): 13109f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -11
README.md CHANGED
@@ -7,17 +7,30 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
7
 
8
  ## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
9
 
10
- | Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora | TyDi QA EM/F1|
11
- |------------------|--------------|-------------|---------------|-------|-----------|---------------|--------|-----------------------|------------------------|--------------|
12
- | AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) | 69.16/82.82 |
13
- | AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) | 68.51/82.66 |
14
- | AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) | 64.39/78.22 |
15
- | mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)| 72.53/85.04 |
16
- | **ArabicT5-Base** | **512** | **8** | **20** | **32K** |**TPUv3-32** | **256K** | **256** | **0.5x** |**17GB (MSA)** | **72.75/85.49** |
17
- | ArabicT5-Large | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) | 74.27/86.37 |
18
- | ArabicT5-xLarge | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) | 74.38/86.60 |
19
-
20
-
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  # Paper
23
 
@@ -33,6 +46,7 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
33
 
34
  https://github.com/salrowili/ArabicT5
35
 
 
36
  # Acknowledgment
37
 
38
  We would like to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.
 
7
 
8
  ## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
9
 
10
+ | Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
11
+ |------------------|--------------|-------------|---------------|-------|-----------|---------------|--------|-----------------------|------------------------|
12
+ | AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) |
13
+ | AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) |
14
+ | AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) |
15
+ | mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
16
+ | ArabicT5-Base | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
17
+ | ArabicT5-Large | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
18
+ | **ArabicT5-xLarge** | **768** | **12** | **36** | **32K** |**TPUv3-128** | **500K** | **512** | **2.0x** |**17GB (MSA)** |
19
+
20
+
21
+ ## Results on TyDi QA, HARD, Sentiment Analysis, Sarcasm Detection ( Best Score is highlighted in bold )
22
+
23
+ | Model | <center>TyDi QA (Dev) | <center>HARD (Hotel Review) | <center>ArSarcasm-v2 (Sentiment Analysis) | <center>ArSarcasm-v2 (Sarcasm Detection) |
24
+ |----------------------|---------------|---------------------|-------------------------------------|----------------------------------|
25
+ | AraT5-Base | <center>70.36/84.21 |<center>96.49|<center>69.7/72.63|<center>60.44|
26
+ | AraT5-Base-MSA | <center>70.90/84.00 |<center>**96.52**|<center>70.03/72.73|<center>60.69|
27
+ | AraT5-Base-Tweets | <center>65.14/79.00 |<center>96.26|<center>70.67/73.52|<center>61.11|
28
+ | mT5-Base | <center>72.20/84.13 |<center>96.24|<center>67.33/68.78|<center>52.18|
29
+ | ArabicT5-Base | <center>70.79/84.76 |<center>96.36|<center>68.93/71.20|<center>58.93|
30
+ | ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|
31
+ | ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|
32
+
33
+ Evaluation Metrics : TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic)
34
 
35
  # Paper
36
 
 
46
 
47
  https://github.com/salrowili/ArabicT5
48
 
49
+
50
  # Acknowledgment
51
 
52
  We would like to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.