leaderboard-pr-bot commited on
Commit
f7efdd7
1 Parent(s): 9a11e1f

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +119 -3
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - fr
5
  - en
@@ -10,10 +9,114 @@ language:
10
  - pl
11
  - ro
12
  - it
13
- pipeline_tag: text-generation
14
  tags:
15
  - medical
16
  - biology
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
 
19
 
@@ -103,4 +206,17 @@ Arxiv : [https://arxiv.org/abs/2402.10373](https://arxiv.org/abs/2402.10373)
103
  }
104
  ```
105
 
106
- **CAUTION!** Both direct and downstream users need to be informed about the risks, biases, and constraints inherent in the model. While the model can produce natural language text, our exploration of its capabilities and limitations is just beginning. In fields such as medicine, comprehending these limitations is crucial. Hence, we strongly advise against deploying this model for natural language generation in production or for professional tasks in the realm of health and medicine.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - fr
4
  - en
 
9
  - pl
10
  - ro
11
  - it
12
+ license: apache-2.0
13
  tags:
14
  - medical
15
  - biology
16
+ pipeline_tag: text-generation
17
+ model-index:
18
+ - name: BioMistral-7B
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: AI2 Reasoning Challenge (25-Shot)
25
+ type: ai2_arc
26
+ config: ARC-Challenge
27
+ split: test
28
+ args:
29
+ num_few_shot: 25
30
+ metrics:
31
+ - type: acc_norm
32
+ value: 54.27
33
+ name: normalized accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: HellaSwag (10-Shot)
42
+ type: hellaswag
43
+ split: validation
44
+ args:
45
+ num_few_shot: 10
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 79.09
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MMLU (5-Shot)
58
+ type: cais/mmlu
59
+ config: all
60
+ split: test
61
+ args:
62
+ num_few_shot: 5
63
+ metrics:
64
+ - type: acc
65
+ value: 55.56
66
+ name: accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: TruthfulQA (0-shot)
75
+ type: truthful_qa
76
+ config: multiple_choice
77
+ split: validation
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: mc2
82
+ value: 51.61
83
+ source:
84
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: Winogrande (5-shot)
91
+ type: winogrande
92
+ config: winogrande_xl
93
+ split: validation
94
+ args:
95
+ num_few_shot: 5
96
+ metrics:
97
+ - type: acc
98
+ value: 73.48
99
+ name: accuracy
100
+ source:
101
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: GSM8k (5-shot)
108
+ type: gsm8k
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 0.0
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BioMistral/BioMistral-7B
119
+ name: Open LLM Leaderboard
120
  ---
121
 
122
 
 
206
  }
207
  ```
208
 
209
+ **CAUTION!** Both direct and downstream users need to be informed about the risks, biases, and constraints inherent in the model. While the model can produce natural language text, our exploration of its capabilities and limitations is just beginning. In fields such as medicine, comprehending these limitations is crucial. Hence, we strongly advise against deploying this model for natural language generation in production or for professional tasks in the realm of health and medicine.
210
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
211
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BioMistral__BioMistral-7B)
212
+
213
+ | Metric |Value|
214
+ |---------------------------------|----:|
215
+ |Avg. |52.33|
216
+ |AI2 Reasoning Challenge (25-Shot)|54.27|
217
+ |HellaSwag (10-Shot) |79.09|
218
+ |MMLU (5-Shot) |55.56|
219
+ |TruthfulQA (0-shot) |51.61|
220
+ |Winogrande (5-shot) |73.48|
221
+ |GSM8k (5-shot) | 0.00|
222
+