victormiller
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -14,11 +14,11 @@ tags:
|
|
14 |
<center><img src="crystalcoder_logo.jpg" alt="crystal coder logo" width="300"/></center>
|
15 |
|
16 |
|
17 |
-
|
18 |
This model excels in balancing natural language processing and coding capabilities.
|
19 |
-
Despite being trained on a smaller dataset of 1.4 trillion tokens—compared to LLaMA 2's 2 trillion—
|
20 |
It demonstrates superior performance in benchmarks like MMLU, HumanEval, and MBPP.
|
21 |
-
By comparing
|
22 |
|
23 |
<center><img src="performance_in_benchmarks.png" alt="performance in benchmarks" /></center>
|
24 |
|
@@ -35,7 +35,7 @@ By comparing CrystalCoder with other similar work, CrystalCoder is quite balance
|
|
35 |
- As reported in prior work, the choice of temperature affect the programming metrics a lot, we evaluate all models with the following temperature:
|
36 |
- Scores for HumanEval is computed with a temperature of 0.2
|
37 |
- Scores for MBPP is computed with a temperature of 0.1
|
38 |
-
- For detailed token breakdown of
|
39 |
|
40 |
|
41 |
|
@@ -61,11 +61,11 @@ Get access now at [LLM360 site](https://www.llm360.ai/)
|
|
61 |
- [Training Code](https://github.com/LLM360/crystalcoder-train)
|
62 |
- [Data Preparation](https://github.com/LLM360/crystalcoder-data-prep)
|
63 |
- [Metrics](https://github.com/LLM360/Analysis360)
|
64 |
-
- [Fully processed
|
65 |
|
66 |
# 🟣 Model Architecture
|
67 |
|
68 |
-
|
69 |
|
70 |
Key modifications introduced by muP include:
|
71 |
|
@@ -106,7 +106,7 @@ Our training has 3 stages:
|
|
106 |
- Stage 2: Pretraining on the other half of SlimPajama (50% x 690B = 345B), plus two epochs of StarCoder Data (2 x 291B).
|
107 |
- Stage 3: Pretraining on `100B` additional Python and web-related data (HTML, JavaScript, CSS) sampled from StarCoder Data, and `10B` tokens sampled from SlimPajama.
|
108 |
|
109 |
-
For details of the training dataset for each stage, please refer to the Dataset section and our
|
110 |
|
111 |
For hyperparameters used in each stage, please refer to the following table:
|
112 |
<center><img src="hyperparameters.png" alt="hyperparameter table" /></center>
|
@@ -115,7 +115,7 @@ For more details of training, please refer to [our paper](https://arxiv.org/pdf/
|
|
115 |
|
116 |
# 🟣 Dataset
|
117 |
|
118 |
-
Our tokenized datasets for all phases are available at [
|
119 |
|
120 |
|
121 |
# 🟣 Model Usage
|
@@ -203,9 +203,9 @@ Selected Metrics are displayed below.
|
|
203 |
|<img src="cc-mmlu-1.png" alt="mmlu" width="400"/> | <img src="cc-truthful-1.png" alt="truthfulqa" width="400"/> |
|
204 |
|
205 |
|
206 |
-
# 🟣
|
207 |
|
208 |
-
We also have instruction tuned versions of
|
209 |
|
210 |
# 🟣 Citation
|
211 |
|
|
|
14 |
<center><img src="crystalcoder_logo.jpg" alt="crystal coder logo" width="300"/></center>
|
15 |
|
16 |
|
17 |
+
Crystal is a 7B parameter language model, distinctively trained on the SlimPajama and StarCoder datasets.
|
18 |
This model excels in balancing natural language processing and coding capabilities.
|
19 |
+
Despite being trained on a smaller dataset of 1.4 trillion tokens—compared to LLaMA 2's 2 trillion—Crystal surpasses LLaMA 2 in some challenging English and coding tasks.
|
20 |
It demonstrates superior performance in benchmarks like MMLU, HumanEval, and MBPP.
|
21 |
+
By comparing Crystal with other similar work, Crystal is quite balance on language and coding tasks.
|
22 |
|
23 |
<center><img src="performance_in_benchmarks.png" alt="performance in benchmarks" /></center>
|
24 |
|
|
|
35 |
- As reported in prior work, the choice of temperature affect the programming metrics a lot, we evaluate all models with the following temperature:
|
36 |
- Scores for HumanEval is computed with a temperature of 0.2
|
37 |
- Scores for MBPP is computed with a temperature of 0.1
|
38 |
+
- For detailed token breakdown of Crystal dataset, refer to the [Crystal dataset repository](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
|
39 |
|
40 |
|
41 |
|
|
|
61 |
- [Training Code](https://github.com/LLM360/crystalcoder-train)
|
62 |
- [Data Preparation](https://github.com/LLM360/crystalcoder-data-prep)
|
63 |
- [Metrics](https://github.com/LLM360/Analysis360)
|
64 |
+
- [Fully processed Crystal pretraining data](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets)
|
65 |
|
66 |
# 🟣 Model Architecture
|
67 |
|
68 |
+
Crystal leverages a GPT-like architecture, akin to LLaMA, but with the addition of maximal update parameterization (**muP**).
|
69 |
|
70 |
Key modifications introduced by muP include:
|
71 |
|
|
|
106 |
- Stage 2: Pretraining on the other half of SlimPajama (50% x 690B = 345B), plus two epochs of StarCoder Data (2 x 291B).
|
107 |
- Stage 3: Pretraining on `100B` additional Python and web-related data (HTML, JavaScript, CSS) sampled from StarCoder Data, and `10B` tokens sampled from SlimPajama.
|
108 |
|
109 |
+
For details of the training dataset for each stage, please refer to the Dataset section and our Crystal Data Card.
|
110 |
|
111 |
For hyperparameters used in each stage, please refer to the following table:
|
112 |
<center><img src="hyperparameters.png" alt="hyperparameter table" /></center>
|
|
|
115 |
|
116 |
# 🟣 Dataset
|
117 |
|
118 |
+
Our tokenized datasets for all phases are available at [CrystalDatasets](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
|
119 |
|
120 |
|
121 |
# 🟣 Model Usage
|
|
|
203 |
|<img src="cc-mmlu-1.png" alt="mmlu" width="400"/> | <img src="cc-truthful-1.png" alt="truthfulqa" width="400"/> |
|
204 |
|
205 |
|
206 |
+
# 🟣 Crystal-Instruct
|
207 |
|
208 |
+
We also have instruction tuned versions of Crystal, based on stage 2 and stage 3 final checkpoints. The Instruct version will be released later.
|
209 |
|
210 |
# 🟣 Citation
|
211 |
|