cointegrated
commited on
Commit
•
097e5df
1
Parent(s):
872779e
Update README.md
Browse files
README.md
CHANGED
@@ -77,36 +77,17 @@ Some datasets obtained from the original sources:
|
|
77 |
|
78 |
## Performance
|
79 |
|
80 |
-
The table below shows ROC AUC for
|
81 |
- [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
|
82 |
- [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
|
83 |
- [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral
|
|
|
|
|
84 |
|
85 |
-
|
86 |
-
|
87 |
-
|add_one_rte|0.82 |0.90 |0.92 | | |
|
88 |
-
|anli_r1 |0.50 |0.68 |0.66 |0.70 |0.75 |
|
89 |
-
|anli_r2 |0.55 |0.62 |0.62 |0.62 |0.69 |
|
90 |
-
|anli_r3 |0.50 |0.63 |0.59 |0.62 |0.64 |
|
91 |
-
|copa |0.55 |0.60 |0.62 | | |
|
92 |
-
|fever |0.88 |0.94 |0.94 |0.91 |0.92 |
|
93 |
-
|help |0.74 |0.87 |0.46 | | |
|
94 |
-
|iie |0.79 |0.85 |0.54 | | |
|
95 |
-
|imppres |0.94 |0.99 |0.99 |0.99 |0.99 |
|
96 |
-
|joci |0.87 |0.93 |0.93 |0.85 |0.80 |
|
97 |
-
|mnli |0.87 |0.92 |0.93 |0.89 |0.86 |
|
98 |
-
|monli |0.94 |1.00 |0.67 | | |
|
99 |
-
|mpe |0.82 |0.90 |0.90 |0.91 |0.80 |
|
100 |
-
|scitail |0.80 |0.96 |0.85 | | |
|
101 |
-
|sick |0.97 |0.99 |0.99 |0.98 |0.96 |
|
102 |
-
|snli |0.95 |0.98 |0.98 |0.99 |0.97 |
|
103 |
-
|terra |0.73 |0.93 |0.93 | | |
|
104 |
-
|
105 |
-
|
106 |
-
|m |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|mean |
|
107 |
-
|------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
|
108 |
-
|n |387 |1000 |1000 |1200 |200 |20474|3355|31232|7661 |939 |19647|269 |1000|2126 |500 |9831|307 |101128|
|
109 |
|------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
|
|
|
110 |
|tiny/entailment |0.77 |0.59 |0.52 |0.53 |0.53|0.90 |0.81|0.78 |0.93 |0.81|0.82 |0.91 |0.81|0.78 |0.93|0.95|0.67 |0.77 |
|
111 |
|twoway/entailment |0.89 |0.73 |0.61 |0.62 |0.58|0.96 |0.92|0.87 |0.99 |0.90|0.90 |0.99 |0.91|0.96 |0.97|0.97|0.87 |0.86 |
|
112 |
|threeway/entailment |0.91 |0.75 |0.61 |0.61 |0.57|0.96 |0.56|0.61 |0.99 |0.90|0.91 |0.67 |0.92|0.84 |0.98|0.98|0.90 |0.80 |
|
@@ -115,3 +96,10 @@ The table below shows ROC AUC for three models on small samples of the DEV sets:
|
|
115 |
|threeway/contradiction | |0.71 |0.64 |0.61 | |0.97 | | |1.00 |0.77|0.92 | |0.89| |0.99|0.98| |0.85 |
|
116 |
|threeway/neutral | |0.79 |0.70 |0.62 | |0.91 | | |0.99 |0.68|0.86 | |0.79| |0.96|0.96| |0.83 |
|
117 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
## Performance
|
79 |
|
80 |
+
The table below shows ROC AUC (one class vs rest) for five models on the corresponding *dev* sets:
|
81 |
- [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
|
82 |
- [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
|
83 |
- [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral
|
84 |
+
- [vicgalle-xlm](https://huggingface.co/vicgalle/xlm-roberta-large-xnli-anli): a large multilingual NLI model
|
85 |
+
- [facebook-bart](https://huggingface.co/facebook/bart-large-mnli): a large multilingual NLI model
|
86 |
|
87 |
+
|
88 |
+
|m |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|total |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
|------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
|
90 |
+
|n_observations |387 |1000 |1000 |1200 |200 |20474|3355|31232|7661 |939 |19647|269 |1000|2126 |500 |9831|307 |101128|
|
91 |
|tiny/entailment |0.77 |0.59 |0.52 |0.53 |0.53|0.90 |0.81|0.78 |0.93 |0.81|0.82 |0.91 |0.81|0.78 |0.93|0.95|0.67 |0.77 |
|
92 |
|twoway/entailment |0.89 |0.73 |0.61 |0.62 |0.58|0.96 |0.92|0.87 |0.99 |0.90|0.90 |0.99 |0.91|0.96 |0.97|0.97|0.87 |0.86 |
|
93 |
|threeway/entailment |0.91 |0.75 |0.61 |0.61 |0.57|0.96 |0.56|0.61 |0.99 |0.90|0.91 |0.67 |0.92|0.84 |0.98|0.98|0.90 |0.80 |
|
|
|
96 |
|threeway/contradiction | |0.71 |0.64 |0.61 | |0.97 | | |1.00 |0.77|0.92 | |0.89| |0.99|0.98| |0.85 |
|
97 |
|threeway/neutral | |0.79 |0.70 |0.62 | |0.91 | | |0.99 |0.68|0.86 | |0.79| |0.96|0.96| |0.83 |
|
98 |
|
99 |
+
For evaluation (and for training of the [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli) and [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway) models), some extra datasets were used:
|
100 |
+
[Add-one RTE](https://cs.brown.edu/people/epavlick/papers/ans.pdf),
|
101 |
+
[CoPA](https://people.ict.usc.edu/~gordon/copa.html),
|
102 |
+
[IIE](https://aclanthology.org/I17-1100), and
|
103 |
+
[SCITAIL](https://allenai.org/data/scitail) taken from [the repo of Felipe Salvatore](https://github.com/felipessalvatore/NLI_datasets) and translatted,
|
104 |
+
[HELP](https://github.com/verypluming/HELP) and [MoNLI](https://github.com/atticusg/MoNLI) taken from the original sources and translated,
|
105 |
+
and Russian [TERRa](https://russiansuperglue.com/ru/tasks/task_info/TERRa).
|