cointegrated
/

rubert-base-cased-nli-threeway

@@ -77,36 +77,17 @@ Some datasets obtained from the original sources:
 ## Performance
-The table below shows ROC AUC for three models on small samples of the DEV sets:
 - [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
 - [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
 - [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral
-|model      |tiny/entailment|twoway/entailment|threeway/entailment|threeway/contradiction|threeway/neutral|
-|-----------|---------------|-----------------|-------------------|-------------------------|-------------------|
-|add_one_rte|0.82           |0.90             |0.92               |                         |                   |
-|anli_r1    |0.50           |0.68             |0.66               |0.70                     |0.75               |
-|anli_r2    |0.55           |0.62             |0.62               |0.62                     |0.69               |
-|anli_r3    |0.50           |0.63             |0.59               |0.62                     |0.64               |
-|copa       |0.55           |0.60             |0.62               |                         |                   |
-|fever      |0.88           |0.94             |0.94               |0.91                     |0.92               |
-|help       |0.74           |0.87             |0.46               |                         |                   |
-|iie        |0.79           |0.85             |0.54               |                         |                   |
-|imppres    |0.94           |0.99             |0.99               |0.99                     |0.99               |
-|joci       |0.87           |0.93             |0.93               |0.85                     |0.80               |
-|mnli       |0.87           |0.92             |0.93               |0.89                     |0.86               |
-|monli      |0.94           |1.00             |0.67               |                         |                   |
-|mpe        |0.82           |0.90             |0.90               |0.91                     |0.80               |
-|scitail    |0.80           |0.96             |0.85               |                         |                   |
-|sick       |0.97           |0.99             |0.99               |0.98                     |0.96               |
-|snli       |0.95           |0.98             |0.98               |0.99                     |0.97               |
-|terra      |0.73           |0.93             |0.93               |                         |                   |
-|m                       |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie  |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|mean  |
-|------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
-|n                       |387        |1000   |1000   |1200   |200 |20474|3355|31232|7661   |939 |19647|269  |1000|2126   |500 |9831|307  |101128|
 |------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
 |tiny/entailment         |0.77       |0.59   |0.52   |0.53   |0.53|0.90 |0.81|0.78 |0.93   |0.81|0.82 |0.91 |0.81|0.78   |0.93|0.95|0.67 |0.77  |
 |twoway/entailment       |0.89       |0.73   |0.61   |0.62   |0.58|0.96 |0.92|0.87 |0.99   |0.90|0.90 |0.99 |0.91|0.96   |0.97|0.97|0.87 |0.86  |
 |threeway/entailment     |0.91       |0.75   |0.61   |0.61   |0.57|0.96 |0.56|0.61 |0.99   |0.90|0.91 |0.67 |0.92|0.84   |0.98|0.98|0.90 |0.80  |
@@ -115,3 +96,10 @@ The table below shows ROC AUC for three models on small samples of the DEV sets:
 |threeway/contradiction  |           |0.71   |0.64   |0.61   |    |0.97 |    |     |1.00   |0.77|0.92 |     |0.89|       |0.99|0.98|     |0.85  |
 |threeway/neutral        |           |0.79   |0.70   |0.62   |    |0.91 |    |     |0.99   |0.68|0.86 |     |0.79|       |0.96|0.96|     |0.83  |

 ## Performance
+The table below shows ROC AUC (one class vs rest) for five models on the corresponding *dev* sets:
 - [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
 - [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
 - [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral
+- [vicgalle-xlm](https://huggingface.co/vicgalle/xlm-roberta-large-xnli-anli): a large multilingual NLI model
+- [facebook-bart](https://huggingface.co/facebook/bart-large-mnli): a large multilingual NLI model
+|m                       |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie  |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|total |
 |------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
+|n_observations          |387        |1000   |1000   |1200   |200 |20474|3355|31232|7661   |939 |19647|269  |1000|2126   |500 |9831|307  |101128|
 |tiny/entailment         |0.77       |0.59   |0.52   |0.53   |0.53|0.90 |0.81|0.78 |0.93   |0.81|0.82 |0.91 |0.81|0.78   |0.93|0.95|0.67 |0.77  |
 |twoway/entailment       |0.89       |0.73   |0.61   |0.62   |0.58|0.96 |0.92|0.87 |0.99   |0.90|0.90 |0.99 |0.91|0.96   |0.97|0.97|0.87 |0.86  |
 |threeway/entailment     |0.91       |0.75   |0.61   |0.61   |0.57|0.96 |0.56|0.61 |0.99   |0.90|0.91 |0.67 |0.92|0.84   |0.98|0.98|0.90 |0.80  |
 |threeway/contradiction  |           |0.71   |0.64   |0.61   |    |0.97 |    |     |1.00   |0.77|0.92 |     |0.89|       |0.99|0.98|     |0.85  |
 |threeway/neutral        |           |0.79   |0.70   |0.62   |    |0.91 |    |     |0.99   |0.68|0.86 |     |0.79|       |0.96|0.96|     |0.83  |
+For evaluation (and for training of the [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli) and [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway) models), some extra datasets were used:
+[Add-one RTE](https://cs.brown.edu/people/epavlick/papers/ans.pdf),
+[CoPA](https://people.ict.usc.edu/~gordon/copa.html),
+[IIE](https://aclanthology.org/I17-1100), and
+[SCITAIL](https://allenai.org/data/scitail) taken from [the repo of Felipe Salvatore](https://github.com/felipessalvatore/NLI_datasets) and translatted,
+[HELP](https://github.com/verypluming/HELP) and [MoNLI](https://github.com/atticusg/MoNLI) taken from the original sources and translated,
+and Russian [TERRa](https://russiansuperglue.com/ru/tasks/task_info/TERRa).