Text Generation
Transformers
Safetensors
English
olmoe
Mixture of Experts
olmo
conversational
Inference Endpoints
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ base_model: allenai/OLMoE-1B-7B-0924-SFT
16
 
17
  # Model Summary
18
 
19
- > OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/OLMoE/OLMoE-1B-7B-0924). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
20
 
21
  This information and more can also be found on the [**OLMoE GitHub repository**](https://github.com/allenai/OLMoE).
22
  - **Paper**: https://arxiv.org/abs/2409.02060
@@ -52,7 +52,7 @@ Here's how it works: imagine you have a bunch of toys, and you want to
52
  ```
53
 
54
  Branches:
55
- - `main`: Preference tuned via DPO model of https://hf.co/OLMoE/OLMoE-1B-7B-0924-SFT (`main` branch)
56
  - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT
57
  - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
58
  - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
 
16
 
17
  # Model Summary
18
 
19
+ > OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/allenai/OLMoE-1B-7B-0924). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
20
 
21
  This information and more can also be found on the [**OLMoE GitHub repository**](https://github.com/allenai/OLMoE).
22
  - **Paper**: https://arxiv.org/abs/2409.02060
 
52
  ```
53
 
54
  Branches:
55
+ - `main`: Preference tuned via DPO model of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT (`main` branch)
56
  - `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT
57
  - `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
58
  - `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.