Update README.md
Browse files
README.md
CHANGED
@@ -117,9 +117,6 @@ The models have been pre-trained using a blend of the following datasets.
|
|
117 |
||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|135B
|
118 |
|Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|10B
|
119 |
|
120 |
-
The pre-training was continuously conducted using a total of 10 folds of non-overlapping data, each consisting of approximately 27-28B tokens.
|
121 |
-
We finalized the pre-training with additional (potentially) high-quality 27B tokens data obtained from the identical source datasets listed above used for the 10-fold data.
|
122 |
-
|
123 |
### Instruction tuning (To be updated)
|
124 |
|
125 |
The models have been fine-tuned on the following datasets.
|
|
|
117 |
||[The Pile](https://huggingface.co/datasets/EleutherAI/pile)|135B
|
118 |
|Codes|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|10B
|
119 |
|
|
|
|
|
|
|
120 |
### Instruction tuning (To be updated)
|
121 |
|
122 |
The models have been fine-tuned on the following datasets.
|