slippylolo
commited on
Commit
•
c1a49e6
1
Parent(s):
591607b
Fix typo
Browse files
README.md
CHANGED
@@ -122,7 +122,7 @@ for seq in sequences:
|
|
122 |
|
123 |
### Training Data
|
124 |
|
125 |
-
Falcon-
|
126 |
|
127 |
| **Data source** | **Fraction** | **Tokens** | **Sources** |
|
128 |
|--------------------|--------------|------------|-----------------------------------|
|
|
|
122 |
|
123 |
### Training Data
|
124 |
|
125 |
+
Falcon-7B was trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. Significant components from our curated copora were inspired by The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)).
|
126 |
|
127 |
| **Data source** | **Fraction** | **Tokens** | **Sources** |
|
128 |
|--------------------|--------------|------------|-----------------------------------|
|