Added some simple evaluation results (#5)

Browse files

- Added some simple evaluation results (d3ba1fa6c6c3c4b1664f27fdbede98d2ab9add4e)

Co-authored-by: kapllan <kapllan@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +20 -2

README.md CHANGED Viewed

@@ -85,9 +85,27 @@ For further details see [Niklaus et al. 2023](https://arxiv.org/abs/2306.02069?u
 ## Evaluation
-For further insights into the evaluation, we refer to the [trainer state](https://huggingface.co/joelito/legal-swiss-roberta-base/blob/main/last-checkpoint/trainer_state.json). Additional information is available in the [tensorboard](https://huggingface.co/joelito/legal-swiss-roberta-base/tensorboard).
-For performance on downstream tasks, such as [LEXTREME](https://huggingface.co/datasets/joelito/lextreme) ([Niklaus et al. 2023](https://arxiv.org/abs/2301.13126)) or [LEXGLUE](https://huggingface.co/datasets/lex_glue) ([Chalkidis et al. 2021](https://arxiv.org/abs/2110.00976)), we refer to the results presented in Niklaus et al. (2023) [1](https://arxiv.org/abs/2306.02069), [2](https://arxiv.org/abs/2306.09237).
 ### Model Architecture and Objective

 ## Evaluation
+We compare joelito/legal-swiss-roberta-base with the other multilingual models.
+The results are based on the text classification tasks presented in [Niklaus et al. (2023)](https://arxiv.org/abs/2306.09237) which are part of [LEXTREME](https://huggingface.co/datasets/joelito/lextreme).
+We provide the arithmetic mean over three seeds (1, 2, 3) based on the macro-F1-score on the test set.
+The highest values are in bold.
+| \_name_or_path                                                                                          | SCP-BC    | SCP-BF    | SCP-CC    | SCP-CF    | SJPXL-C   | SJPXL-F   | SLAP-SC  | SLAP-SF   |
+| :------------------------------------------------------------------------------------------------------ | :-------- | :-------- | :-------- | :-------- | :-------- | :-------- | :------- | :-------- |
+| [ZurichNLP/swissbert-xlm-vocab](https://huggingface.co/ZurichNLP/swissbert-xlm-vocab)                   | 71.36     | 57.48     | 27.33     | 23.37     | 80.81     | 61.75     | 77.89    | 71.27     |
+| [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased)         | 66.56     | 56.58     | 22.67     | 21.31     | 77.26     | 60.79     | 73.54    | 72.24     |
+| [facebook/xmod-base](https://huggingface.co/facebook/xmod-base)                                         | 70.35     | 58.16     | 23.87     | 19.57     | 80.55     | 60.84     | 73.16    | 69.03     |
+| [joelito/legal-swiss-longformer-base](https://huggingface.co/joelito/legal-swiss-longformer-base)       | **73.25** | **60.06** | **28.68** | 24.39     | 87.46     | **65.23** | 83.84    | 77.96     |
+| [joelito/legal-swiss-roberta-base](https://huggingface.co/joelito/legal-swiss-roberta-base)             | 72.41     | 59.31     | 25.99     | 23.27     | 87.48     | 64.16     | **86.8** | **81.56** |
+| [joelito/legal-swiss-roberta-large](https://huggingface.co/joelito/legal-swiss-roberta-large)           | 70.95     | 57.59     | 27.86     | 23.48     | **88.33** | 62.92     | 82.1     | 78.62     |
+| [microsoft/Multilingual-MiniLM-L12-H384](https://huggingface.co/microsoft/Multilingual-MiniLM-L12-H384) | 67.29     | 56.56     | 24.23     | 14.9      | 79.52     | 58.29     | 63.03    | 67.57     |
+| [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base)                         | 72.01     | 57.59     | 22.93     | **25.18** | 79.41     | 60.89     | 67.64    | 74.13     |
+| [xlm-roberta-base](https://huggingface.co/xlm-roberta-base)                                             | 68.55     | 58.48     | 25.66     | 21.52     | 80.98     | 61.45     | 79.3     | 74.47     |
+| [xlm-roberta-large](https://huggingface.co/xlm-roberta-large)                                           | 69.5      | 58.15     | 27.9      | 22.05     | 82.19     | 61.24     | 81.09    | 71.82     |
+For more detailed insights into the performance on downstream tasks, such as [LEXTREME](https://huggingface.co/datasets/joelito/lextreme) ([Niklaus et al. 2023](https://arxiv.org/abs/2301.13126)) or [LEXGLUE](https://huggingface.co/datasets/lex_glue) ([Chalkidis et al. 2021](https://arxiv.org/abs/2110.00976)), we refer to the results presented in Niklaus et al. (2023) [1](https://arxiv.org/abs/2306.02069), [2](https://arxiv.org/abs/2306.09237).
+For further insights into the evaluation, we refer to the [trainer state](https://huggingface.co/joelito/legal-swiss-roberta-base/blob/main/last-checkpoint/trainer_state.json). Additional information is available in the [tensorboard](https://huggingface.co/joelito/legal-swiss-roberta-base/tensorboard).
 ### Model Architecture and Objective