Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,28 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
|
|
30 |
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|
|
31 |
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|
|
32 |
|
33 |
-
Evaluation Metrics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
# Paper
|
36 |
|
|
|
30 |
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|
|
31 |
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|
|
32 |
|
33 |
+
Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic)
|
34 |
+
|
35 |
+
# Speedup Results
|
36 |
+
|
37 |
+
|
38 |
+
Below are our speedup results on the TyDi QA dataset where all models are fine-tuned 13 epochs with a learning rate of 2e-4 and batch size of 3 on each device on the TPU (TPU3v-8 batch=3x8->24).
|
39 |
+
|
40 |
+
Please note these results when we fixed our hyperparameters for all models. To get the best results after doing a grid search refer to the table above.
|
41 |
+
|
42 |
+
|
43 |
+
| <center>Model | <center>Run Time (hh:mm:ss) | <center>Results on TyDi QA |
|
44 |
+
|----------------------|---------------|---------------------|
|
45 |
+
| AraT5-Base-MSA | <center>00:20:41 |<center>69.92/82.50|
|
46 |
+
| AraT5-Base | <center>00:20:53 |<center>68.40/81.97|
|
47 |
+
| AraT5-Base-Tweets | <center>00:21:17 |<center>61.67/75.96|
|
48 |
+
| mT5-Base | <center>00:28:24 |<center>57.98/72.81|
|
49 |
+
| ArabicT5-Base | <center>00:20:00 |<center>70.79/83.85|
|
50 |
+
| ArabicT5-Large | <center>00:23:50 |<center>71.22/84.42|
|
51 |
+
| ArabicT5-xLarge | <center>00:52:17 |<center>72.86/86.00|
|
52 |
+
|
53 |
+
|
54 |
+
Please note that we can further speed up our ArabicT5-Base by increasing the batch size since it could handle larger batch size than other base-scale models due to its hidden layer size (512).
|
55 |
|
56 |
# Paper
|
57 |
|