SaveBertAndGpt
commited on
Commit
•
fc1085c
1
Parent(s):
cc5ca20
Update README
Browse filesUpdated Gemma numbers in result table, added "open weights" vs. "partially open" etc categorization
README.md
CHANGED
@@ -117,17 +117,20 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
|
|
117 |
## Evaluation
|
118 |
Core model results for OLMo 2 7B and 13B models are found below.
|
119 |
|
120 |
-
| Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k |
|
121 |
|-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
|
122 |
-
|
|
123 |
| Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
|
124 |
| Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
|
125 |
| Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
|
126 |
| Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
|
127 |
| Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
|
|
|
128 |
| Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
|
|
|
129 |
| StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
|
130 |
| Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
|
|
|
131 |
| Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
|
132 |
| OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
|
133 |
| MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
|
|
|
117 |
## Evaluation
|
118 |
Core model results for OLMo 2 7B and 13B models are found below.
|
119 |
|
120 |
+
| Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMLUPro | TriviaQA |
|
121 |
|-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
|
122 |
+
| *Open weights models:* |
|
123 |
| Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
|
124 |
| Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
|
125 |
| Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
|
126 |
| Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
|
127 |
| Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
|
128 |
+
| Gemma-2-9B | 4.4·10²³ | 67.8 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 70.1 | 42 | 81.8 |
|
129 |
| Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
|
130 |
+
| *Partially open models:* |
|
131 |
| StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
|
132 |
| Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
|
133 |
+
| *Fully open models:* |
|
134 |
| Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
|
135 |
| OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
|
136 |
| MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
|