Update README.md
Browse filesAdd MMLU benchmark results
README.md
CHANGED
@@ -52,11 +52,11 @@ To get the expected features and performance for the chat versions, a specific L
|
|
52 |
# Evaluation Results
|
53 |
|Model | Size| hellaswag | arc_challenge | MMLU|
|
54 |
|---|---|---|---|---|
|
55 |
-
| Llama-2-chat | 7B | 78.55% | 52.9% | |
|
56 |
-
| Llama-2-chat | 13B | 81.94% | 59.04% | |
|
57 |
-
| Trurl 2.0 (with MMLU) | 13B | 80.09% | 59.30% |
|
58 |
-
| Trurl 2.0 (no MMLU) | 13B | TO-DO | TO-DO | |
|
59 |
-
| Trurl 2.0 | 7b | TO-DO | TO-DO |
|
60 |
|
61 |
<img src="https://voicelab.ai/wp-content/uploads/trurl-hero.webp" alt="trurl graphic" style="width:100px;"/>
|
62 |
|
|
|
52 |
# Evaluation Results
|
53 |
|Model | Size| hellaswag | arc_challenge | MMLU|
|
54 |
|---|---|---|---|---|
|
55 |
+
| Llama-2-chat | 7B | 78.55% | 52.9% | 48.32% |
|
56 |
+
| Llama-2-chat | 13B | 81.94% | 59.04% | 54.64% |
|
57 |
+
| Trurl 2.0 (with MMLU) | 13B | 80.09% | 59.30% | 78.35% |
|
58 |
+
| Trurl 2.0 (no MMLU) | 13B | TO-DO | TO-DO | TO-DO|
|
59 |
+
| Trurl 2.0 | 7b | TO-DO | TO-DO | TO-DO|
|
60 |
|
61 |
<img src="https://voicelab.ai/wp-content/uploads/trurl-hero.webp" alt="trurl graphic" style="width:100px;"/>
|
62 |
|