allenai
/

OLMo-2-1124-7B

Safetensors

English

olmo2

Model card Files Files and versions Community

amanrangapur commited on 9 days ago

Commit

43f5c98

•

1 Parent(s): 627e96b

Update README.md

Browse files

Files changed (1) hide show

README.md +36 -38

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: apache-2.0
 datasets:
-- allenai/dolma
 language:
 - en
 ---
@@ -16,7 +16,7 @@ language:
 OLMo2 7B November 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a ____ point increase in ____, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
-The OLMo models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
 We release all code, checkpoints, logs (coming soon), and details involved in training these models.
@@ -27,6 +27,26 @@ The core models released in this batch are the following:
 | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo2-7B-1124) | 4 Trillion   | 32     | 4096        | 32              |  4096  |
 | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo2-13B-1124) | 5 Trillion   | 40     | 5120        | 42              |  4096  |
 We have released checkpoints for these models, for every 1000 training steps.
 The naming convention is `stepXXX-tokensYYYB`.
@@ -42,6 +62,20 @@ out = list_repo_refs("allenai/OLMo2-7B-1124")
 branches = [b.name for b in out.branches]
 ```
 ### Model Description
 - **Developed by:** Allen Institute for AI (Ai2)
@@ -65,42 +99,6 @@ branches = [b.name for b in out.branches]
 - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
-## Uses
-### Inference
-Proceed as usual with HuggingFace:
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
-tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
-message = ["Language modeling is "]
-inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
-# optional verifying cuda
-# inputs = {k: v.to('cuda') for k,v in inputs.items()}
-# olmo = olmo.to('cuda')
-response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
-print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
->> 'Language modeling is the first step to build natural language generation...'
-```
-Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
-The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
-### Fine-tuning
-Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
-1. Fine-tune with the OLMo repository:
-```bash
-torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
-    --data.paths=[{path_to_data}/input_ids.npy] \
-    --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
-    --load_path={path_to_checkpoint} \
-    --reset_trainer_state
-```
-For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
-2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
 <!-- TODO -->
 ## Evaluation

 ---
 license: apache-2.0
 datasets:
+- allenai/dolmino-mix-1124
 language:
 - en
 ---
 OLMo2 7B November 2024 is an updated version of the original [OLMo 7B](https://huggingface.co/allenai/OLMo-7B) model rocking a ____ point increase in ____, among other evaluations improvements, from an improved version of the Dolma dataset and staged training.
 OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
+The OLMo models are trained on the [Dolmino](https://huggingface.co/datasets/allenai/dolmino-mix-1124) dataset.
 We release all code, checkpoints, logs (coming soon), and details involved in training these models.
 | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo2-7B-1124) | 4 Trillion   | 32     | 4096        | 32              |  4096  |
 | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo2-13B-1124) | 5 Trillion   | 40     | 5120        | 42              |  4096  |
+## Inference
+Proceed as usual with HuggingFace:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
+tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
+message = ["Language modeling is "]
+inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
+# optional verifying cuda
+# inputs = {k: v.to('cuda') for k,v in inputs.items()}
+# olmo = olmo.to('cuda')
+response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
+print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
+>> 'Language modeling is the first step to build natural language generation...'
+```
+Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
+The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
 We have released checkpoints for these models, for every 1000 training steps.
 The naming convention is `stepXXX-tokensYYYB`.
 branches = [b.name for b in out.branches]
 ```
+### Fine-tuning
+Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
+1. Fine-tune with the OLMo repository:
+```bash
+torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
+    --data.paths=[{path_to_data}/input_ids.npy] \
+    --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
+    --load_path={path_to_checkpoint} \
+    --reset_trainer_state
+```
+For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
+2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
 ### Model Description
 - **Developed by:** Allen Institute for AI (Ai2)
 - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
 <!-- TODO -->
 ## Evaluation