ping0rr
's Collections
gemma_knowledg_tree
updated
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
45
Measuring Massive Multitask Language Understanding
Paper
•
2009.03300
•
Published
•
3
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper
•
1905.07830
•
Published
•
4
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper
•
1911.11641
•
Published
•
2
SocialIQA: Commonsense Reasoning about Social Interactions
Paper
•
1904.09728
•
Published
•
2
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper
•
1905.10044
•
Published
•
1
On the Measure of Intelligence
Paper
•
1911.01547
•
Published
Evaluating Large Language Models Trained on Code
Paper
•
2107.03374
•
Published
•
7
Program Synthesis with Large Language Models
Paper
•
2108.07732
•
Published
•
4
Training Verifiers to Solve Math Word Problems
Paper
•
2110.14168
•
Published
•
4
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Paper
•
2304.06364
•
Published
•
2
Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models
Paper
•
2206.04615
•
Published
•
5
BBQ: A Hand-Built Bias Benchmark for Question Answering
Paper
•
2110.08193
•
Published
•
1
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models
Paper
•
2009.11462
•
Published
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Paper
•
2109.07958
•
Published
•
1
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
Implicit Hate Speech Detection
Paper
•
2203.09509
•
Published
•
2