omkarenator commited on
Commit
a1fb934
1 Parent(s): 58d76d5
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - nlp
9
+ - llm
10
+ ---
11
+ # Amber
12
+
13
+ <center><img src="amber_logo.png" alt="amber logo" width="300"/></center>
14
+
15
+ We present Amber, the first model in the LLM360 family. Amber is an
16
+ 7B English language model with the LLaMA architecture.
17
+
18
+ ## About LLM360
19
+ LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
20
+ where all training details, model checkpoints, intermediate results, and
21
+ additional analyses are made available to the community. Our goal is to advance
22
+ the field by inviting the community to deepen the understanding of LLMs
23
+ together. As the first step of the project LLM360, we release all intermediate
24
+ model checkpoints, our fully-prepared pre-training dataset, all source code and
25
+ configurations, and training details. We are
26
+ committed to continually pushing the boundaries of LLMs through this open-source
27
+ effort.
28
+
29
+ Get access now at [LLM360 site](https://www.llm360.ai/)
30
+
31
+ ## Model Description
32
+
33
+ - **Model type:** Language model with the same architecture as LLaMA-7B
34
+ - **Language(s) (NLP):** English
35
+ - **License:** Apache 2.0
36
+ - **Resources for more information:**
37
+ - [Training Code](https://github.com/LLM360/amber-train)
38
+ - [Data Preparation](https://github.com/LLM360/amber-data-prep)
39
+ - [Metrics](https://github.com/LLM360/Analysis360)
40
+ - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
41
+
42
+
43
+ # Loading Amber
44
+
45
+ To load a specific checkpoint, simply pass a revision with a value between `"ckpt_000"` and `"ckpt_358"`. If no revision is provided, it will load `"ckpt_359"`, which is the final checkpoint.
46
+
47
+ ```python
48
+ from transformers import LlamaTokenizer, LlamaForCausalLM
49
+
50
+ tokenizer = LlamaTokenizer.from_pretrained("LLM360/Amber", revision="ckpt_356")
51
+ model = LlamaForCausalLM.from_pretrained("LLM360/Amber", revision="ckpt_356")
52
+
53
+ input_text = "translate English to German: How old are you?"
54
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
55
+
56
+ outputs = model.generate(input_ids)
57
+ print(tokenizer.decode(outputs[0]))
58
+
59
+ ```
60
+
61
+ # Amber Training Details
62
+
63
+ ## DataMix
64
+ | Subset | Tokens (Billion) |
65
+ | ----------- | ----------- |
66
+ | Arxiv | 30.00 |
67
+ | Book | 28.86 |
68
+ | C4 | 197.67 |
69
+ | Refined-Web | 665.01 |
70
+ | StarCoder | 291.92 |
71
+ | StackExchange | 21.75 |
72
+ | Wikipedia | 23.90 |
73
+ | Total | 1259.13 |
74
+
75
+ ## Hyperparameters
76
+ | Hyperparameter | Value |
77
+ | ----------- | ----------- |
78
+ | Total Parameters | 6.7B |
79
+ | Hidden Size | 4096 |
80
+ | Intermediate Size (MLPs) | 11008 |
81
+ | Number of Attention Heads | 32 |
82
+ | Number of Hidden Lyaers | 32 |
83
+ | RMSNorm ɛ | 1e^-6 |
84
+ | Max Seq Length | 2048 |
85
+ | Vocab Size | 32000 |
86
+
87
+ | Training Loss |
88
+ |------------------------------------------------------------|
89
+ | <img src="loss_curve.png" alt="loss curve" width="400"/> |
90
+
91
+
92
+ # Evaluation
93
+
94
+ Please refer to our [W&B project page](https://wandb.ai/llm360/CrystalCoder) for complete training logs and evaluation results.
95
+
96
+ | ARC | HellaSwag |
97
+ |--------------------------------------------------------|--------------------------------------------------------------------|
98
+ | <img src="amber-arc-curve.png" alt="arc" width="400"/> | <img src="amber-hellaswag-curve.png" alt="hellaswag" width="400"/> |
99
+
100
+ |MMLU | TruthfulQA |
101
+ |-----------------------------------------------------|-----------------------------------------------------------|
102
+ |<img src="amber-mmlu-curve.png" alt="mmlu" width="400"/> | <img src="amber-truthfulqa-curve.png" alt="truthfulqa" width="400"/> |
103
+
104
+ # Citation
105
+
106
+ Coming soon...
amber-arc-curve.pdf ADDED
Binary file (18.5 kB). View file
 
amber-arc-curve.png ADDED
amber-hellaswag-curve.pdf ADDED
Binary file (20.3 kB). View file
 
amber-hellaswag-curve.png ADDED
amber-mmlu-curve.pdf ADDED
Binary file (19.1 kB). View file
 
amber-mmlu-curve.png ADDED
amber-truthfulqa-curve.pdf ADDED
Binary file (19.2 kB). View file
 
amber-truthfulqa-curve.png ADDED
amber_logo.png ADDED
arc.png ADDED
hellaswag.png ADDED
loss_curve.png ADDED
mmlu.png ADDED
truthfulqa.png ADDED