Workbench
Collection
Untested and unfinished models. Works in progress.
•
8 items
•
Updated
Experiment, can DUS be taken one or more steps further?
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 62.48 |
AI2 Reasoning Challenge (25-Shot) | 60.67 |
HellaSwag (10-Shot) | 83.27 |
MMLU (5-Shot) | 64.99 |
TruthfulQA (0-shot) | 43.60 |
Winogrande (5-shot) | 80.27 |
GSM8k (5-shot) | 42.08 |