allenai
/

OLMo-2-1124-13B-DPO

@@ -1,215 +1,119 @@
 ---
-language: en
-model-index:
-- name: allenai/open_instruct_dev
-  results:
-  - task:
-      type: preference_evaluation
-    dataset:
-      name: reward-bench
-      type: allenai/reward-bench
-    metrics:
-    - type: accuracy
-      value: 1.0
-    - type: accuracy
-      value: 1.0
-    - type: accuracy
-      value: 1.0
-    - type: accuracy
-      value: 1.0
 ---
-# Model Card for allenai/open_instruct_dev
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** en
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+base_model:
+- allenai/OLMo-2-1124-13B-DPO
+library_name: transformers
 ---
+<img src="https://allenai.org/olmo/olmo-7b-animation.gif" alt="OLMo Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# OLMo-2-1124-13B-DPO
+OLMo-2 13B DPO November 2024 is finetuned variant of the [OLMo-2 13B November 2024](https://huggingface.co/allenai/OLMo2-13B-1124) model, which has undergone supervised finetuning on the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture) and further DPO training.
+Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
+Check out [the OLMo-2 paper](https://TODO) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
+OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
+These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
+The core models released in this batch include the following:
+| **Stage**           | **OLMo-2 7B**                                                                                          | **OLMo-2 7B**                                                                                         |
+|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
+| **Base Model**       | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124)                                | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124)                             |
+| **SFT**              | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT)                | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT)              |
+| **DPO**              | [allenai/OLMo-2-1124-7B-DPO](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO)                | [allenai/OLMo-2-1124-13B-DPO](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO)              |
+| **Final Models (RLVR)**     | [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct)                        | [allenai/OLMo-2-1124-13B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct)                      |
+| **Reward Model (RM)**| [allenai/OLMo-2-1124-7B-RM](https://huggingface.co/allenai/OLMo-2-1124-7B-RM)                                                     | (Same as 8B)                                                     |
+## Model description
+- **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
+- **Language(s) (NLP):** Primarily English
+- **License:** Apache 2.0
+- **Finetuned from model:** allenai/OLMo-2-13B-1124-SFT
+### Model Sources
+- **Project Page:** https://allenai.org/olmo
+- **Repositories:**
+    - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
+    - Evaluation code: https://github.com/allenai/olmes
+    - Further fine-tuning code: https://github.com/allenai/open-instruct
+- **Paper:** Coming soon! TODO
+- **Demo:** https://playground.allenai.org/
+## Using the model
+### Loading with HuggingFace
+To load the model with HuggingFace, use the following snippet:
+```
+from transformers import AutoModelForCausalLM
+olmo_model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-13B-DPO")
+```
+### Chat template
+The chat template for our models is formatted as:
+```
+<|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
+```
+Or with new lines expanded:
+```
+<|endoftext|><|user|>
+How are you doing?
+<|assistant|>
+I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
+```
+It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
+### System prompt
+In Ai2 demos, we use this system prompt by default:
+```
+You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.
+```
+The model has not been trained with a specific system prompt in mind.
+### Bias, Risks, and Limitations
+The OLMo-2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
+See the Falcon 180B model card for an example of this.
+## Performance
+TODO
+## Hyperparameters
+Note we use a length-normalized variant of DPO for training.
+DPO:
+- **Learning Rate**: 8E-7 (7B, 13B)
+- **Beta**: 5
+- **Effective Batch Size:** 128 (7B, 13B)
+- **Max. Sequence Length:** 2048
+- **Learning Rate Schedule:** Linear
+- **LR Warmup Ratio:** 0.1
+- **Num. Epochs:** 1
+## License and use
+OLMo-2 is licensed under the Apache 2.0 license.
+OLMo-2 is intended for research and educational use.
+For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
+## Citation
+If OLMo-2 or any of the related materials were helpful to your work, please cite:
+```
+TODO
+```