leaderboard-pr-bot commited on
Commit
08d0033
1 Parent(s): 4b44b5e

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +136 -25
README.md CHANGED
@@ -1,7 +1,15 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
 
 
 
 
 
 
 
 
5
  inference:
6
  parameters:
7
  max_new_tokens: 64
@@ -16,47 +24,136 @@ widget:
16
  example_title: El Microondas
17
  - text: Kennesaw State University is a public
18
  example_title: Kennesaw State University
19
- - text: >-
20
- Bungie Studios is an American video game developer. They are most famous for
21
- developing the award winning Halo series of video games. They also made
22
- Destiny. The studio was founded
23
  example_title: Bungie
24
  - text: The Mona Lisa is a world-renowned painting created by
25
  example_title: Mona Lisa
26
- - text: >-
27
- The Harry Potter series, written by J.K. Rowling, begins with the book
28
- titled
29
  example_title: Harry Potter Series
30
- - text: >-
31
- Question: I have cities, but no houses. I have mountains, but no trees. I
32
  have water, but no fish. What am I?
33
 
34
- Answer:
35
  example_title: Riddle
36
  - text: The process of photosynthesis involves the conversion of
37
  example_title: Photosynthesis
38
- - text: >-
39
- Jane went to the store to buy some groceries. She picked up apples, oranges,
40
  and a loaf of bread. When she got home, she realized she forgot
41
  example_title: Story Continuation
42
- - text: >-
43
- Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
44
- another train leaves Station B at 10:00 AM and travels at 80 mph, when will
45
  they meet if the distance between the stations is 300 miles?
46
 
47
- To determine
48
  example_title: Math Problem
49
  - text: In the context of computer programming, an algorithm is
50
  example_title: Algorithm Definition
51
  pipeline_tag: text-generation
52
- tags:
53
- - smol_llama
54
- - llama2
55
- datasets:
56
- - JeanKaddour/minipile
57
- - pszemraj/simple_wikipedia_LM
58
- - mattymchen/refinedweb-3m
59
- - BEE-spoke-data/knowledge-inoc-concat-v1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ---
61
 
62
 
@@ -85,3 +182,17 @@ A small 220M param (total) decoder model. This is the first version of the model
85
  - full DPO - [link](https://huggingface.co/BEE-spoke-data/zephyr-220m-dpo-full)
86
 
87
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
+ tags:
6
+ - smol_llama
7
+ - llama2
8
+ datasets:
9
+ - JeanKaddour/minipile
10
+ - pszemraj/simple_wikipedia_LM
11
+ - mattymchen/refinedweb-3m
12
+ - BEE-spoke-data/knowledge-inoc-concat-v1
13
  inference:
14
  parameters:
15
  max_new_tokens: 64
 
24
  example_title: El Microondas
25
  - text: Kennesaw State University is a public
26
  example_title: Kennesaw State University
27
+ - text: Bungie Studios is an American video game developer. They are most famous for
28
+ developing the award winning Halo series of video games. They also made Destiny.
29
+ The studio was founded
 
30
  example_title: Bungie
31
  - text: The Mona Lisa is a world-renowned painting created by
32
  example_title: Mona Lisa
33
+ - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
 
 
34
  example_title: Harry Potter Series
35
+ - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
 
36
  have water, but no fish. What am I?
37
 
38
+ Answer:'
39
  example_title: Riddle
40
  - text: The process of photosynthesis involves the conversion of
41
  example_title: Photosynthesis
42
+ - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
 
43
  and a loaf of bread. When she got home, she realized she forgot
44
  example_title: Story Continuation
45
+ - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
46
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
 
47
  they meet if the distance between the stations is 300 miles?
48
 
49
+ To determine'
50
  example_title: Math Problem
51
  - text: In the context of computer programming, an algorithm is
52
  example_title: Algorithm Definition
53
  pipeline_tag: text-generation
54
+ model-index:
55
+ - name: smol_llama-220M-GQA
56
+ results:
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: AI2 Reasoning Challenge (25-Shot)
62
+ type: ai2_arc
63
+ config: ARC-Challenge
64
+ split: test
65
+ args:
66
+ num_few_shot: 25
67
+ metrics:
68
+ - type: acc_norm
69
+ value: 24.83
70
+ name: normalized accuracy
71
+ source:
72
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: HellaSwag (10-Shot)
79
+ type: hellaswag
80
+ split: validation
81
+ args:
82
+ num_few_shot: 10
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 29.76
86
+ name: normalized accuracy
87
+ source:
88
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MMLU (5-Shot)
95
+ type: cais/mmlu
96
+ config: all
97
+ split: test
98
+ args:
99
+ num_few_shot: 5
100
+ metrics:
101
+ - type: acc
102
+ value: 25.85
103
+ name: accuracy
104
+ source:
105
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
106
+ name: Open LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: TruthfulQA (0-shot)
112
+ type: truthful_qa
113
+ config: multiple_choice
114
+ split: validation
115
+ args:
116
+ num_few_shot: 0
117
+ metrics:
118
+ - type: mc2
119
+ value: 44.55
120
+ source:
121
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
122
+ name: Open LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: Winogrande (5-shot)
128
+ type: winogrande
129
+ config: winogrande_xl
130
+ split: validation
131
+ args:
132
+ num_few_shot: 5
133
+ metrics:
134
+ - type: acc
135
+ value: 50.99
136
+ name: accuracy
137
+ source:
138
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
139
+ name: Open LLM Leaderboard
140
+ - task:
141
+ type: text-generation
142
+ name: Text Generation
143
+ dataset:
144
+ name: GSM8k (5-shot)
145
+ type: gsm8k
146
+ config: main
147
+ split: test
148
+ args:
149
+ num_few_shot: 5
150
+ metrics:
151
+ - type: acc
152
+ value: 0.68
153
+ name: accuracy
154
+ source:
155
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
156
+ name: Open LLM Leaderboard
157
  ---
158
 
159
 
 
182
  - full DPO - [link](https://huggingface.co/BEE-spoke-data/zephyr-220m-dpo-full)
183
 
184
  ---
185
+
186
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
187
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-GQA)
188
+
189
+ | Metric |Value|
190
+ |---------------------------------|----:|
191
+ |Avg. |29.44|
192
+ |AI2 Reasoning Challenge (25-Shot)|24.83|
193
+ |HellaSwag (10-Shot) |29.76|
194
+ |MMLU (5-Shot) |25.85|
195
+ |TruthfulQA (0-shot) |44.55|
196
+ |Winogrande (5-shot) |50.99|
197
+ |GSM8k (5-shot) | 0.68|
198
+