victormiller commited on
Commit
0ccb0e1
·
verified ·
1 Parent(s): 3d719c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -14,11 +14,11 @@ tags:
14
  <center><img src="crystalcoder_logo.jpg" alt="crystal coder logo" width="300"/></center>
15
 
16
 
17
- CrystalCoder is a 7B parameter language model, distinctively trained on the SlimPajama and StarCoder datasets.
18
  This model excels in balancing natural language processing and coding capabilities.
19
- Despite being trained on a smaller dataset of 1.4 trillion tokens—compared to LLaMA 2's 2 trillion—CrystalCoder surpasses LLaMA 2 in some challenging English and coding tasks.
20
  It demonstrates superior performance in benchmarks like MMLU, HumanEval, and MBPP.
21
- By comparing CrystalCoder with other similar work, CrystalCoder is quite balance on language and coding tasks.
22
 
23
  <center><img src="performance_in_benchmarks.png" alt="performance in benchmarks" /></center>
24
 
@@ -35,7 +35,7 @@ By comparing CrystalCoder with other similar work, CrystalCoder is quite balance
35
  - As reported in prior work, the choice of temperature affect the programming metrics a lot, we evaluate all models with the following temperature:
36
  - Scores for HumanEval is computed with a temperature of 0.2
37
  - Scores for MBPP is computed with a temperature of 0.1
38
- - For detailed token breakdown of CrystalCoder dataset, refer to the [CrystalCoder dataset repository](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
39
 
40
 
41
 
@@ -61,11 +61,11 @@ Get access now at [LLM360 site](https://www.llm360.ai/)
61
  - [Training Code](https://github.com/LLM360/crystalcoder-train)
62
  - [Data Preparation](https://github.com/LLM360/crystalcoder-data-prep)
63
  - [Metrics](https://github.com/LLM360/Analysis360)
64
- - [Fully processed CrystalCoder pretraining data](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets)
65
 
66
  # 🟣 Model Architecture
67
 
68
- CrystalCoder leverages a GPT-like architecture, akin to LLaMA, but with the addition of maximal update parameterization (**muP**).
69
 
70
  Key modifications introduced by muP include:
71
 
@@ -106,7 +106,7 @@ Our training has 3 stages:
106
  - Stage 2: Pretraining on the other half of SlimPajama (50% x 690B = 345B), plus two epochs of StarCoder Data (2 x 291B).
107
  - Stage 3: Pretraining on `100B` additional Python and web-related data (HTML, JavaScript, CSS) sampled from StarCoder Data, and `10B` tokens sampled from SlimPajama.
108
 
109
- For details of the training dataset for each stage, please refer to the Dataset section and our CrystalCoder Data Card.
110
 
111
  For hyperparameters used in each stage, please refer to the following table:
112
  <center><img src="hyperparameters.png" alt="hyperparameter table" /></center>
@@ -115,7 +115,7 @@ For more details of training, please refer to [our paper](https://arxiv.org/pdf/
115
 
116
  # 🟣 Dataset
117
 
118
- Our tokenized datasets for all phases are available at [CrystalCoderDatasets](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
119
 
120
 
121
  # 🟣 Model Usage
@@ -203,9 +203,9 @@ Selected Metrics are displayed below.
203
  |<img src="cc-mmlu-1.png" alt="mmlu" width="400"/> | <img src="cc-truthful-1.png" alt="truthfulqa" width="400"/> |
204
 
205
 
206
- # 🟣 CrystalCoder-Instruct
207
 
208
- We also have instruction tuned versions of CrystalCoder, based on stage 2 and stage 3 final checkpoints. The Instruct version will be released later.
209
 
210
  # 🟣 Citation
211
 
 
14
  <center><img src="crystalcoder_logo.jpg" alt="crystal coder logo" width="300"/></center>
15
 
16
 
17
+ Crystal is a 7B parameter language model, distinctively trained on the SlimPajama and StarCoder datasets.
18
  This model excels in balancing natural language processing and coding capabilities.
19
+ Despite being trained on a smaller dataset of 1.4 trillion tokens—compared to LLaMA 2's 2 trillion—Crystal surpasses LLaMA 2 in some challenging English and coding tasks.
20
  It demonstrates superior performance in benchmarks like MMLU, HumanEval, and MBPP.
21
+ By comparing Crystal with other similar work, Crystal is quite balance on language and coding tasks.
22
 
23
  <center><img src="performance_in_benchmarks.png" alt="performance in benchmarks" /></center>
24
 
 
35
  - As reported in prior work, the choice of temperature affect the programming metrics a lot, we evaluate all models with the following temperature:
36
  - Scores for HumanEval is computed with a temperature of 0.2
37
  - Scores for MBPP is computed with a temperature of 0.1
38
+ - For detailed token breakdown of Crystal dataset, refer to the [Crystal dataset repository](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
39
 
40
 
41
 
 
61
  - [Training Code](https://github.com/LLM360/crystalcoder-train)
62
  - [Data Preparation](https://github.com/LLM360/crystalcoder-data-prep)
63
  - [Metrics](https://github.com/LLM360/Analysis360)
64
+ - [Fully processed Crystal pretraining data](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets)
65
 
66
  # 🟣 Model Architecture
67
 
68
+ Crystal leverages a GPT-like architecture, akin to LLaMA, but with the addition of maximal update parameterization (**muP**).
69
 
70
  Key modifications introduced by muP include:
71
 
 
106
  - Stage 2: Pretraining on the other half of SlimPajama (50% x 690B = 345B), plus two epochs of StarCoder Data (2 x 291B).
107
  - Stage 3: Pretraining on `100B` additional Python and web-related data (HTML, JavaScript, CSS) sampled from StarCoder Data, and `10B` tokens sampled from SlimPajama.
108
 
109
+ For details of the training dataset for each stage, please refer to the Dataset section and our Crystal Data Card.
110
 
111
  For hyperparameters used in each stage, please refer to the following table:
112
  <center><img src="hyperparameters.png" alt="hyperparameter table" /></center>
 
115
 
116
  # 🟣 Dataset
117
 
118
+ Our tokenized datasets for all phases are available at [CrystalDatasets](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
119
 
120
 
121
  # 🟣 Model Usage
 
203
  |<img src="cc-mmlu-1.png" alt="mmlu" width="400"/> | <img src="cc-truthful-1.png" alt="truthfulqa" width="400"/> |
204
 
205
 
206
+ # 🟣 Crystal-Instruct
207
 
208
+ We also have instruction tuned versions of Crystal, based on stage 2 and stage 3 final checkpoints. The Instruct version will be released later.
209
 
210
  # 🟣 Citation
211