Update README.md
Browse files
README.md
CHANGED
@@ -14,16 +14,16 @@ datasets:
|
|
14 |
|
15 |
# OLMo-2-1124-13B-DPO
|
16 |
|
17 |
-
OLMo-2 13B DPO November 2024 is
|
18 |
Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
19 |
-
Check out
|
20 |
|
21 |
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
|
22 |
These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
|
23 |
The core models released in this batch include the following:
|
24 |
|
25 |
|
26 |
-
| **Stage** | **OLMo
|
27 |
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
|
28 |
| **Base Model** | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124) | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124) |
|
29 |
| **SFT** | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT) | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT) |
|
@@ -47,7 +47,7 @@ The core models released in this batch include the following:
|
|
47 |
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
|
48 |
- Evaluation code: https://github.com/allenai/olmes
|
49 |
- Further fine-tuning code: https://github.com/allenai/open-instruct
|
50 |
-
- **Paper:** Coming soon!
|
51 |
- **Demo:** https://playground.allenai.org/
|
52 |
|
53 |
## Using the model
|
@@ -86,13 +86,33 @@ The model has not been trained with a specific system prompt in mind.
|
|
86 |
|
87 |
### Bias, Risks, and Limitations
|
88 |
|
89 |
-
The OLMo
|
90 |
See the Falcon 180B model card for an example of this.
|
91 |
|
92 |
|
93 |
## Performance
|
94 |
|
95 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
|
97 |
## Hyperparameters
|
98 |
|
@@ -109,13 +129,14 @@ DPO:
|
|
109 |
|
110 |
## License and use
|
111 |
|
112 |
-
OLMo
|
113 |
-
OLMo
|
114 |
For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
|
|
|
115 |
|
116 |
## Citation
|
117 |
|
118 |
-
If OLMo
|
119 |
```
|
120 |
TODO
|
121 |
```
|
|
|
14 |
|
15 |
# OLMo-2-1124-13B-DPO
|
16 |
|
17 |
+
OLMo-2 13B DPO November 2024 is post-trained variant of the [OLMo-2 13B November 2024](https://huggingface.co/allenai/OLMo2-13B-1124) model, which has undergone supervised finetuning on the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix).
|
18 |
Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
19 |
+
Check out OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
|
20 |
|
21 |
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
|
22 |
These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
|
23 |
The core models released in this batch include the following:
|
24 |
|
25 |
|
26 |
+
| **Stage** | **OLMo 2 7B** | **OLMo 2 13B** |
|
27 |
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
|
28 |
| **Base Model** | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124) | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124) |
|
29 |
| **SFT** | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT) | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT) |
|
|
|
47 |
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
|
48 |
- Evaluation code: https://github.com/allenai/olmes
|
49 |
- Further fine-tuning code: https://github.com/allenai/open-instruct
|
50 |
+
- **Paper:** Coming soon!
|
51 |
- **Demo:** https://playground.allenai.org/
|
52 |
|
53 |
## Using the model
|
|
|
86 |
|
87 |
### Bias, Risks, and Limitations
|
88 |
|
89 |
+
The OLMo 2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
|
90 |
See the Falcon 180B model card for an example of this.
|
91 |
|
92 |
|
93 |
## Performance
|
94 |
|
95 |
+
| Model | Average | AlpacaEval | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
|
96 |
+
|-------|---------|------------|-----|------|--------|---------|------|-------|---------|-------|---------|
|
97 |
+
| **Open weights models** |
|
98 |
+
| Gemma-2-9B-it | 51.9 | 43.7 | 2.5 | 58.8 | 79.7 | 69.9 | 29.8 | 69.1 | 75.5 | 28.3 | 61.4 |
|
99 |
+
| Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 |
|
100 |
+
| Mistral-Nemo-Instruct-2407 | 51.1 | 45.8 | 56.0 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 |
|
101 |
+
| Qwen-2.5-7B-Instruct | 57.1 | 29.7 | 25.3 | 54.4 | 83.8 | 74.7 | 69.9 | 76.6 | 75.0 | 18.1 | 63.1 |
|
102 |
+
| Llama-3.1-8B-Instruct | 58.9 | 25.8 | 69.7 | 61.7 | 83.4 | 80.6 | 42.5 | 71.3 | 70.2 | 28.4 | 55.1 |
|
103 |
+
| Tülu 3 8B | 60.4 | 34.0 | 66.0 | 62.6 | 87.6 | 82.4 | 43.7 | 68.2 | 75.4 | 29.1 | 55.0 |
|
104 |
+
| Qwen-2.5-14B-Instruct | 61.0 | 34.6 | 35.4 | 50.5 | 83.9 | 82.4 | 70.6 | 81.1 | 79.3 | 21.1 | 70.8 |
|
105 |
+
| **Fully open models** |
|
106 |
+
| OLMo-7B-Instruct | 28.2 | 5.2 | 35.3 | 30.7 | 14.3 | 32.2 | 2.1 | 46.3 | 54.0 | 17.1 | 44.5 |
|
107 |
+
| OLMo-7B-0424-Instruct | 33.2 | 8.5 | 35.2 | 47.9 | 23.2 | 39.2 | 5.2 | 48.9 | 49.3 | 18.9 | 55.2 |
|
108 |
+
| OLMoE-1B-7B-0924-Instruct | 35.5 | 8.5 | 37.2 | 34.3 | 47.2 | 46.2 | 8.4 | 51.6 | 51.6 | 20.6 | 49.1 |
|
109 |
+
| MAP-Neo-7B-Instruct | 42.9 | 17.6 | 26.4 | 48.2 | 69.4 | 35.9 | 31.5 | 56.5 | 73.7 | 18.4 | 51.6 |
|
110 |
+
| *OLMo-2-7B-DPO* | 55.0 | 29.9 | 47.0 | 58.8 | 82.4 | 74.5 | 31.2 | 63.4 | 81.5 | 24.5 | 57.2 |
|
111 |
+
| *OLMo-2-7B-SFT* | 50.0 | 9.3 | 50.7 | 58.2 | 71.2 | 68.0 | 25.1 | 62.0 | 82.4 | 25.0 | 47.8 |
|
112 |
+
| *OLMo-2-13B-DPO* | 61.0 | 38.3 | 58.5 | 71.9 | 84.2 | 80.6 | 35.0 | 68.5 | 80.6 | 28.9 | 63.9 |
|
113 |
+
| *OLMo-2-13B-SFT* | 55.7 | 12.0 | 58.8 | 71.8 | 75.7 | 71.5 | 31.1 | 67.3 | 82.8 | 29.3 | 56.2 |
|
114 |
+
| **OLMo-2-7B-1124–Instruct** | 55.7 | 31.0 | 48.9 | 58.9 | 85.2 | 75.6 | 31.3 | 63.9 | 81.2 | 24.6 | 56.3 |
|
115 |
+
| **OLMo-2-13B-1124-Instruct** | 61.4 | 37.5 | 58.4 | 72.1 | 87.4 | 80.4 | 39.7 | 68.6 | 77.5 | 28.8 | 63.9 |
|
116 |
|
117 |
## Hyperparameters
|
118 |
|
|
|
129 |
|
130 |
## License and use
|
131 |
|
132 |
+
OLMo 2 is licensed under the Apache 2.0 license.
|
133 |
+
OLMo 2 is intended for research and educational use.
|
134 |
For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
|
135 |
+
This model has been fine-tuned using a dataset mix with outputs generated from third party models and are subject to additional terms: [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
|
136 |
|
137 |
## Citation
|
138 |
|
139 |
+
If OLMo 2 or any of the related materials were helpful to your work, please cite:
|
140 |
```
|
141 |
TODO
|
142 |
```
|