leaderboard-pr-bot commited on
Commit
6a18965
1 Parent(s): eb3dd73

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +118 -1
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: bigscience-bloom-rail-1.0
3
  language:
4
  - ak
5
  - ar
@@ -49,7 +48,111 @@ language:
49
  - zhs
50
  - zht
51
  - zu
 
52
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ---
54
 
55
  <h1 style='text-align: center '>BLOOM LM</h1>
@@ -555,3 +658,17 @@ Initial prompting experiments using interim checkpoints: https://huggingface.co/
555
 
556
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
557
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - ak
4
  - ar
 
48
  - zhs
49
  - zht
50
  - zu
51
+ license: bigscience-bloom-rail-1.0
52
  pipeline_tag: text-generation
53
+ model-index:
54
+ - name: bloom-1b1
55
+ results:
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: AI2 Reasoning Challenge (25-Shot)
61
+ type: ai2_arc
62
+ config: ARC-Challenge
63
+ split: test
64
+ args:
65
+ num_few_shot: 25
66
+ metrics:
67
+ - type: acc_norm
68
+ value: 28.33
69
+ name: normalized accuracy
70
+ source:
71
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigscience/bloom-1b1
72
+ name: Open LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: HellaSwag (10-Shot)
78
+ type: hellaswag
79
+ split: validation
80
+ args:
81
+ num_few_shot: 10
82
+ metrics:
83
+ - type: acc_norm
84
+ value: 42.78
85
+ name: normalized accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigscience/bloom-1b1
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: MMLU (5-Shot)
94
+ type: cais/mmlu
95
+ config: all
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 26.7
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigscience/bloom-1b1
105
+ name: Open LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: TruthfulQA (0-shot)
111
+ type: truthful_qa
112
+ config: multiple_choice
113
+ split: validation
114
+ args:
115
+ num_few_shot: 0
116
+ metrics:
117
+ - type: mc2
118
+ value: 41.8
119
+ source:
120
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigscience/bloom-1b1
121
+ name: Open LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: Winogrande (5-shot)
127
+ type: winogrande
128
+ config: winogrande_xl
129
+ split: validation
130
+ args:
131
+ num_few_shot: 5
132
+ metrics:
133
+ - type: acc
134
+ value: 55.01
135
+ name: accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigscience/bloom-1b1
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: GSM8k (5-shot)
144
+ type: gsm8k
145
+ config: main
146
+ split: test
147
+ args:
148
+ num_few_shot: 5
149
+ metrics:
150
+ - type: acc
151
+ value: 0.23
152
+ name: accuracy
153
+ source:
154
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=bigscience/bloom-1b1
155
+ name: Open LLM Leaderboard
156
  ---
157
 
158
  <h1 style='text-align: center '>BLOOM LM</h1>
 
658
 
659
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
660
 
661
+
662
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
663
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bigscience__bloom-1b1)
664
+
665
+ | Metric |Value|
666
+ |---------------------------------|----:|
667
+ |Avg. |32.47|
668
+ |AI2 Reasoning Challenge (25-Shot)|28.33|
669
+ |HellaSwag (10-Shot) |42.78|
670
+ |MMLU (5-Shot) |26.70|
671
+ |TruthfulQA (0-shot) |41.80|
672
+ |Winogrande (5-shot) |55.01|
673
+ |GSM8k (5-shot) | 0.23|
674
+