Safetensors
English
llama
loubnabnl HF staff commited on
Commit
364637a
1 Parent(s): 0e06c1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -1
README.md CHANGED
@@ -1 +1,79 @@
1
- all good here, just have to move step60000 to main
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceTB/finemath
5
+ language:
6
+ - en
7
+ base_model:
8
+ - meta-llama/Llama-3.2-3B
9
+ ---
10
+
11
+ # Model Card
12
+
13
+ ## Model summary
14
+
15
+ This model is part of the 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) ablations, we continue pretraining [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) base on different math datasets for 60B tokens.
16
+ The model has 3.21B parameters and 4096 context length. It was trained on **60B tokens** from FineMath-4+ subset of 📐 [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath), tokenized using `llama3` tokenizer.
17
+
18
+ - **License**: Apache-2
19
+ - **Languages**: English
20
+
21
+ ## Use
22
+
23
+ ### Intended use
24
+
25
+ This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English with a focus on math.
26
+ It is important to note that the primary intended use case of this model is to compare its performance with other models trained under the same conditions. This model is not necessarily the best possible outcome achievable with the given dataset.
27
+
28
+ ### Generation
29
+
30
+ ```python
31
+ # pip install -q transformers
32
+ from transformers import AutoModelForCausalLM, AutoTokenizer
33
+
34
+ model = MODEL_HERE
35
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained(model)
38
+ model = AutoModelForCausalLM.from_pretrained(model).to(device)
39
+
40
+ inputs = tokenizer.encode("Machine Learning is", return_tensors="pt").to(device)
41
+ outputs = model.generate(inputs)
42
+ print(tokenizer.decode(outputs[0]))
43
+ ```
44
+
45
+ ## Intermediate checkpoints
46
+
47
+ We are releasing intermediate checkpoints for this model at intervals of every 10000 training steps (10B tokens) in separate branches. The naming convention is `10B`.
48
+
49
+ You can load a specific model revision with `transformers` using the argument `revision`:
50
+ ```python
51
+ model = AutoModelForCausalLM.from_pretrained(MODEL_HERE, revision="10B")
52
+ ```
53
+ You can access all the revisions for the models via the following code:
54
+ ```python
55
+ from huggingface_hub import list_repo_refs
56
+ out = list_repo_refs(MODEL_HERE)
57
+ print([b.name for b in out.branches])
58
+ ```
59
+
60
+ ## Training
61
+ ### Model
62
+ - **Architecture**: Llama3
63
+ - **Pretraining steps**: 60k
64
+ - **Pretraining tokens**: 60B
65
+ - **Precision**: bfloat16
66
+
67
+ ### Hardware
68
+ - **GPUs**: 64 H100
69
+
70
+ ### Software
71
+ - [nanotron](https://github.com/huggingface/nanotron/) for training
72
+ - [datatrove](https://github.com/huggingface/datatrove) for tokenization
73
+ - [lighteval](https://github.com/huggingface/lighteval) for evaluation
74
+
75
+ ## Evaluation
76
+ We used the SmolLM2 setup to evaluate all our ablation models with `lighteval`. You can find the details here: https://github.com/huggingface/smollm/tree/main/evaluation#smollm2-base-models
77
+
78
+ ## Limitations
79
+ This model was predominantly trained on English math data, potentially limiting its performance in other languages. Furthermore, the model's behavior is influenced by the quality and diversity of its training data, which may include biases and harmful content.