Adding Evaluation Results
#12
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -1,152 +1,158 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
|
|
5 |
datasets:
|
6 |
- togethercomputer/RedPajama-Data-1T
|
7 |
- togethercomputer/RedPajama-Data-Instruct
|
8 |
widget:
|
9 |
-
- text:
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
Sentence: I'm not sure about this.
|
16 |
-
Label: neutral
|
17 |
-
|
18 |
-
Sentence: I liked some parts but I didn't like other parts.
|
19 |
-
Label: mixed
|
20 |
-
|
21 |
-
Sentence: I think the background image could have been better.
|
22 |
-
Label: negative
|
23 |
-
|
24 |
-
Sentence: I really like it.
|
25 |
-
Label:
|
26 |
example_title: Sentiment Analysis
|
27 |
-
- text:
|
28 |
-
|
29 |
|
30 |
Question: What is the capital of Canada?
|
|
|
31 |
Answer: Ottawa
|
32 |
|
|
|
33 |
Question: What is the currency of Switzerland?
|
|
|
34 |
Answer: Swiss franc
|
35 |
|
|
|
36 |
Question: In which country is Wisconsin located?
|
37 |
-
|
|
|
38 |
example_title: Question Answering
|
39 |
-
- text:
|
40 |
-
Given a news article, classify its topic.
|
41 |
|
42 |
Possible labels: 1. World 2. Sports 3. Business 4. Sci/Tech
|
43 |
|
44 |
|
45 |
-
Article: A nearby star thought to harbor comets and asteroids now appears to
|
46 |
-
|
47 |
|
48 |
Label: Sci/Tech
|
49 |
|
50 |
|
51 |
-
Article: Soaring crude prices plus worries about the economy and the outlook
|
52 |
-
|
53 |
-
|
54 |
|
55 |
Label: Business
|
56 |
|
57 |
|
58 |
-
Article: Murtagh a stickler for success Northeastern field hockey coach
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
|
63 |
-
Label::
|
64 |
example_title: Topic Classification
|
65 |
-
- text:
|
66 |
-
|
67 |
|
68 |
Input: Can you recommend some upscale restaurants in New York?
|
|
|
69 |
Output: What upscale restaurants do you recommend in New York?
|
70 |
|
|
|
71 |
Input: What are the famous places we should not miss in Paris?
|
|
|
72 |
Output: Recommend some of the best places to visit in Paris?
|
73 |
|
|
|
74 |
Input: Could you recommend some hotels that have cheap price in Zurich?
|
75 |
-
|
|
|
76 |
example_title: Paraphrasing
|
77 |
-
- text:
|
78 |
-
Given a review from Amazon's food products, the task is to generate a short
|
79 |
summary of the given review in the input.
|
80 |
|
81 |
|
82 |
-
Input: I have bought several of the Vitality canned dog food products and
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
|
87 |
Output: Good Quality Dog Food
|
88 |
|
89 |
|
90 |
-
Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were
|
91 |
-
|
92 |
-
|
93 |
|
94 |
Output: Not as Advertised
|
95 |
|
96 |
|
97 |
-
Input: My toddler loves this game to a point where he asks for it. That's a
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
company’s stuff. Please keep up the great work.
|
105 |
|
106 |
-
Output:
|
107 |
example_title: Text Summarization
|
108 |
-
- text:
|
109 |
-
|
110 |
|
111 |
Context: The river overflowed the bank.
|
|
|
112 |
Word: bank
|
|
|
113 |
Sense: river bank
|
114 |
|
|
|
115 |
Context: A mouse takes much more room than a trackball.
|
|
|
116 |
Word: mouse
|
|
|
117 |
Sense: computer mouse
|
118 |
|
|
|
119 |
Context: The bank will not be accepting cash on Saturdays.
|
|
|
120 |
Word: bank
|
|
|
121 |
Sense: commercial (finance) banks
|
122 |
|
|
|
123 |
Context: Bill killed the project
|
|
|
124 |
Word: kill
|
125 |
-
|
|
|
126 |
example_title: Word Sense Disambiguation
|
127 |
-
- text:
|
128 |
-
|
129 |
-
(entailment)/disagree (contradiction) with each other.
|
130 |
|
131 |
Possible labels: 1. entailment 2. contradiction
|
132 |
|
133 |
|
134 |
-
Sentence 1: The skier was on the edge of the ramp. Sentence 2: The skier was
|
135 |
-
|
136 |
|
137 |
Label: entailment
|
138 |
|
139 |
|
140 |
-
Sentence 1: The boy skated down the staircase railing. Sentence 2: The boy
|
141 |
-
|
142 |
|
143 |
Label: contradiction
|
144 |
|
145 |
|
146 |
-
Sentence 1: Two middle-aged people stand by a golf hole. Sentence 2: A
|
147 |
-
|
148 |
|
149 |
-
Label:
|
150 |
example_title: Natural Language Inference
|
151 |
inference:
|
152 |
parameters:
|
@@ -154,6 +160,109 @@ inference:
|
|
154 |
top_p: 0.7
|
155 |
top_k: 50
|
156 |
max_new_tokens: 128
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
157 |
---
|
158 |
|
159 |
# RedPajama-INCITE-7B-Instruct
|
@@ -341,4 +450,17 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
|
|
341 |
|
342 |
## Community
|
343 |
|
344 |
-
Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
license: apache-2.0
|
5 |
datasets:
|
6 |
- togethercomputer/RedPajama-Data-1T
|
7 |
- togethercomputer/RedPajama-Data-Instruct
|
8 |
widget:
|
9 |
+
- text: "Label the sentences as either 'positive', 'negative', 'mixed', or 'neutral':\
|
10 |
+
\ \n\nSentence: I can say that there isn't anything I would change.\nLabel: positive\n\
|
11 |
+
\nSentence: I'm not sure about this.\nLabel: neutral\n\nSentence: I liked some\
|
12 |
+
\ parts but I didn't like other parts.\nLabel: mixed\n\nSentence: I think the\
|
13 |
+
\ background image could have been better.\nLabel: negative\n\nSentence: I really\
|
14 |
+
\ like it.\nLabel:"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
example_title: Sentiment Analysis
|
16 |
+
- text: 'Please answer the following question:
|
17 |
+
|
18 |
|
19 |
Question: What is the capital of Canada?
|
20 |
+
|
21 |
Answer: Ottawa
|
22 |
|
23 |
+
|
24 |
Question: What is the currency of Switzerland?
|
25 |
+
|
26 |
Answer: Swiss franc
|
27 |
|
28 |
+
|
29 |
Question: In which country is Wisconsin located?
|
30 |
+
|
31 |
+
Answer:'
|
32 |
example_title: Question Answering
|
33 |
+
- text: 'Given a news article, classify its topic.
|
|
|
34 |
|
35 |
Possible labels: 1. World 2. Sports 3. Business 4. Sci/Tech
|
36 |
|
37 |
|
38 |
+
Article: A nearby star thought to harbor comets and asteroids now appears to be
|
39 |
+
home to planets, too.
|
40 |
|
41 |
Label: Sci/Tech
|
42 |
|
43 |
|
44 |
+
Article: Soaring crude prices plus worries about the economy and the outlook for
|
45 |
+
earnings are expected to hang over the stock market next week during the depth
|
46 |
+
of the summer doldrums.
|
47 |
|
48 |
Label: Business
|
49 |
|
50 |
|
51 |
+
Article: Murtagh a stickler for success Northeastern field hockey coach Cheryl
|
52 |
+
Murtagh doesn''t want the glare of the spotlight that shines on her to detract
|
53 |
+
from a team that has been the America East champion for the past three years and
|
54 |
+
has been to the NCAA tournament 13 times.
|
55 |
|
56 |
+
Label::'
|
57 |
example_title: Topic Classification
|
58 |
+
- text: 'Paraphrase the given sentence into a different sentence.
|
59 |
+
|
60 |
|
61 |
Input: Can you recommend some upscale restaurants in New York?
|
62 |
+
|
63 |
Output: What upscale restaurants do you recommend in New York?
|
64 |
|
65 |
+
|
66 |
Input: What are the famous places we should not miss in Paris?
|
67 |
+
|
68 |
Output: Recommend some of the best places to visit in Paris?
|
69 |
|
70 |
+
|
71 |
Input: Could you recommend some hotels that have cheap price in Zurich?
|
72 |
+
|
73 |
+
Output:'
|
74 |
example_title: Paraphrasing
|
75 |
+
- text: 'Given a review from Amazon''s food products, the task is to generate a short
|
|
|
76 |
summary of the given review in the input.
|
77 |
|
78 |
|
79 |
+
Input: I have bought several of the Vitality canned dog food products and have
|
80 |
+
found them all to be of good quality. The product looks more like a stew than
|
81 |
+
a processed meat and it smells better. My Labrador is finicky and she appreciates
|
82 |
+
this product better than most.
|
83 |
|
84 |
Output: Good Quality Dog Food
|
85 |
|
86 |
|
87 |
+
Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually
|
88 |
+
small sized unsalted. Not sure if this was an error or if the vendor intended
|
89 |
+
to represent the product as ''Jumbo''.
|
90 |
|
91 |
Output: Not as Advertised
|
92 |
|
93 |
|
94 |
+
Input: My toddler loves this game to a point where he asks for it. That''s a big
|
95 |
+
thing for me. Secondly, no glitching unlike one of their competitors (PlayShifu).
|
96 |
+
Any tech I don’t have to reach out to support for help is a good tech for me.
|
97 |
+
I even enjoy some of the games and activities in this. Overall, this is a product
|
98 |
+
that shows that the developers took their time and made sure people would not
|
99 |
+
be asking for refund. I’ve become bias regarding this product and honestly I look
|
100 |
+
forward to buying more of this company’s stuff. Please keep up the great work.
|
|
|
101 |
|
102 |
+
Output:'
|
103 |
example_title: Text Summarization
|
104 |
+
- text: 'Identify which sense of a word is meant in a given context.
|
105 |
+
|
106 |
|
107 |
Context: The river overflowed the bank.
|
108 |
+
|
109 |
Word: bank
|
110 |
+
|
111 |
Sense: river bank
|
112 |
|
113 |
+
|
114 |
Context: A mouse takes much more room than a trackball.
|
115 |
+
|
116 |
Word: mouse
|
117 |
+
|
118 |
Sense: computer mouse
|
119 |
|
120 |
+
|
121 |
Context: The bank will not be accepting cash on Saturdays.
|
122 |
+
|
123 |
Word: bank
|
124 |
+
|
125 |
Sense: commercial (finance) banks
|
126 |
|
127 |
+
|
128 |
Context: Bill killed the project
|
129 |
+
|
130 |
Word: kill
|
131 |
+
|
132 |
+
Sense:'
|
133 |
example_title: Word Sense Disambiguation
|
134 |
+
- text: 'Given a pair of sentences, choose whether the two sentences agree (entailment)/disagree
|
135 |
+
(contradiction) with each other.
|
|
|
136 |
|
137 |
Possible labels: 1. entailment 2. contradiction
|
138 |
|
139 |
|
140 |
+
Sentence 1: The skier was on the edge of the ramp. Sentence 2: The skier was dressed
|
141 |
+
in winter clothes.
|
142 |
|
143 |
Label: entailment
|
144 |
|
145 |
|
146 |
+
Sentence 1: The boy skated down the staircase railing. Sentence 2: The boy is
|
147 |
+
a newbie skater.
|
148 |
|
149 |
Label: contradiction
|
150 |
|
151 |
|
152 |
+
Sentence 1: Two middle-aged people stand by a golf hole. Sentence 2: A couple
|
153 |
+
riding in a golf cart.
|
154 |
|
155 |
+
Label:'
|
156 |
example_title: Natural Language Inference
|
157 |
inference:
|
158 |
parameters:
|
|
|
160 |
top_p: 0.7
|
161 |
top_k: 50
|
162 |
max_new_tokens: 128
|
163 |
+
model-index:
|
164 |
+
- name: RedPajama-INCITE-Instruct-7B-v0.1
|
165 |
+
results:
|
166 |
+
- task:
|
167 |
+
type: text-generation
|
168 |
+
name: Text Generation
|
169 |
+
dataset:
|
170 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
171 |
+
type: ai2_arc
|
172 |
+
config: ARC-Challenge
|
173 |
+
split: test
|
174 |
+
args:
|
175 |
+
num_few_shot: 25
|
176 |
+
metrics:
|
177 |
+
- type: acc_norm
|
178 |
+
value: 44.11
|
179 |
+
name: normalized accuracy
|
180 |
+
source:
|
181 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
|
182 |
+
name: Open LLM Leaderboard
|
183 |
+
- task:
|
184 |
+
type: text-generation
|
185 |
+
name: Text Generation
|
186 |
+
dataset:
|
187 |
+
name: HellaSwag (10-Shot)
|
188 |
+
type: hellaswag
|
189 |
+
split: validation
|
190 |
+
args:
|
191 |
+
num_few_shot: 10
|
192 |
+
metrics:
|
193 |
+
- type: acc_norm
|
194 |
+
value: 72.02
|
195 |
+
name: normalized accuracy
|
196 |
+
source:
|
197 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
|
198 |
+
name: Open LLM Leaderboard
|
199 |
+
- task:
|
200 |
+
type: text-generation
|
201 |
+
name: Text Generation
|
202 |
+
dataset:
|
203 |
+
name: MMLU (5-Shot)
|
204 |
+
type: cais/mmlu
|
205 |
+
config: all
|
206 |
+
split: test
|
207 |
+
args:
|
208 |
+
num_few_shot: 5
|
209 |
+
metrics:
|
210 |
+
- type: acc
|
211 |
+
value: 37.62
|
212 |
+
name: accuracy
|
213 |
+
source:
|
214 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
|
215 |
+
name: Open LLM Leaderboard
|
216 |
+
- task:
|
217 |
+
type: text-generation
|
218 |
+
name: Text Generation
|
219 |
+
dataset:
|
220 |
+
name: TruthfulQA (0-shot)
|
221 |
+
type: truthful_qa
|
222 |
+
config: multiple_choice
|
223 |
+
split: validation
|
224 |
+
args:
|
225 |
+
num_few_shot: 0
|
226 |
+
metrics:
|
227 |
+
- type: mc2
|
228 |
+
value: 33.96
|
229 |
+
source:
|
230 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
|
231 |
+
name: Open LLM Leaderboard
|
232 |
+
- task:
|
233 |
+
type: text-generation
|
234 |
+
name: Text Generation
|
235 |
+
dataset:
|
236 |
+
name: Winogrande (5-shot)
|
237 |
+
type: winogrande
|
238 |
+
config: winogrande_xl
|
239 |
+
split: validation
|
240 |
+
args:
|
241 |
+
num_few_shot: 5
|
242 |
+
metrics:
|
243 |
+
- type: acc
|
244 |
+
value: 64.96
|
245 |
+
name: accuracy
|
246 |
+
source:
|
247 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
|
248 |
+
name: Open LLM Leaderboard
|
249 |
+
- task:
|
250 |
+
type: text-generation
|
251 |
+
name: Text Generation
|
252 |
+
dataset:
|
253 |
+
name: GSM8k (5-shot)
|
254 |
+
type: gsm8k
|
255 |
+
config: main
|
256 |
+
split: test
|
257 |
+
args:
|
258 |
+
num_few_shot: 5
|
259 |
+
metrics:
|
260 |
+
- type: acc
|
261 |
+
value: 1.59
|
262 |
+
name: accuracy
|
263 |
+
source:
|
264 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
|
265 |
+
name: Open LLM Leaderboard
|
266 |
---
|
267 |
|
268 |
# RedPajama-INCITE-7B-Instruct
|
|
|
450 |
|
451 |
## Community
|
452 |
|
453 |
+
Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
|
454 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
455 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_togethercomputer__RedPajama-INCITE-Instruct-7B-v0.1)
|
456 |
+
|
457 |
+
| Metric |Value|
|
458 |
+
|---------------------------------|----:|
|
459 |
+
|Avg. |42.38|
|
460 |
+
|AI2 Reasoning Challenge (25-Shot)|44.11|
|
461 |
+
|HellaSwag (10-Shot) |72.02|
|
462 |
+
|MMLU (5-Shot) |37.62|
|
463 |
+
|TruthfulQA (0-shot) |33.96|
|
464 |
+
|Winogrande (5-shot) |64.96|
|
465 |
+
|GSM8k (5-shot) | 1.59|
|
466 |
+
|