Safetensors
English
olmo2
SaveBertAndGpt commited on
Commit
fc1085c
1 Parent(s): cc5ca20

Update README

Browse files

Updated Gemma numbers in result table, added "open weights" vs. "partially open" etc categorization

Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -117,17 +117,20 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?
117
  ## Evaluation
118
  Core model results for OLMo 2 7B and 13B models are found below.
119
 
120
- | Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMWLUPro | TriviaQA |
121
  |-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
122
- | Gemma-2-9B | 4.4·10²³ | 52.9 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 1.1 | 42 | 0.9 |
123
  | Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
124
  | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
125
  | Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
126
  | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
127
  | Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
 
128
  | Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
 
129
  | StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
130
  | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
 
131
  | Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
132
  | OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
133
  | MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |
 
117
  ## Evaluation
118
  Core model results for OLMo 2 7B and 13B models are found below.
119
 
120
+ | Model | Train FLOPs | Average | ARC/C | HSwag | WinoG | MMLU | DROP | NQ | AGIEval | GSM8k | MMLUPro | TriviaQA |
121
  |-------------------|------------|---------|--------|--------|--------|-------|-------|-----|----------|--------|-----------|-----------|
122
+ | *Open weights models:* |
123
  | Llama-2-13B | 1.6·10²³ | 54.1 | 67.3 | 83.9 | 74.9 | 55.7 | 45.6 | 38.4 | 41.5 | 28.1 | 23.9 | 81.3 |
124
  | Mistral-7B-v0.3 | n/a | 58.8 | 78.3 | 83.1 | 77.7 | 63.5 | 51.8 | 37.2 | 47.3 | 40.1 | 30 | 79.3 |
125
  | Llama-3.1-8B | 7.2·10²³ | 61.8 | 79.5 | 81.6 | 76.6 | 66.9 | 56.4 | 33.9 | 51.3 | 56.5 | 34.7 | 80.3 |
126
  | Mistral-Nemo-12B | n/a | 66.9 | 85.2 | 85.6 | 81.5 | 69.5 | 69.2 | 39.7 | 54.7 | 62.1 | 36.7 | 84.6 |
127
  | Qwen-2.5-7B | 8.2·10²³ | 67.4 | 89.5 | 89.7 | 74.2 | 74.4 | 55.8 | 29.9 | 63.7 | 81.5 | 45.8 | 69.4 |
128
+ | Gemma-2-9B | 4.4·10²³ | 67.8 | 89.5 | 87.3 | 78.8 | 70.6 | 63 | 38 | 57.3 | 70.1 | 42 | 81.8 |
129
  | Qwen-2.5-14B | 16.0·10²³ | 72.2 | 94 | 94 | 80 | 79.3 | 51.5 | 37.3 | 71 | 83.4 | 52.8 | 79.1 |
130
+ | *Partially open models:* |
131
  | StableLM-2-12B | 2.9·10²³ | 62.2 | 81.9 | 84.5 | 77.7 | 62.4 | 55.5 | 37.6 | 50.9 | 62 | 29.3 | 79.9 |
132
  | Zamba-2-7B | n/c | 65.2 | 92.2 | 89.4 | 79.6 | 68.5 | 51.7 | 36.5 | 55.5 | 67.2 | 32.8 | 78.8 |
133
+ | *Fully open models:* |
134
  | Amber-7B | 0.5·10²³ | 35.2 | 44.9 | 74.5 | 65.5 | 24.7 | 26.1 | 18.7 | 21.8 | 4.8 | 11.7 | 59.3 |
135
  | OLMo-7B | 1.0·10²³ | 38.3 | 46.4 | 78.1 | 68.5 | 28.3 | 27.3 | 24.8 | 23.7 | 9.2 | 12.1 | 64.1 |
136
  | MAP-Neo-7B | 2.1·10²³ | 49.6 | 78.4 | 72.8 | 69.2 | 58 | 39.4 | 28.9 | 45.8 | 12.5 | 25.9 | 65.1 |