data update
Browse files
README.md
CHANGED
@@ -283,9 +283,9 @@ print(output)
|
|
283 |
|
284 |
<!-- TO DO: To be completed once the paper is ready -->
|
285 |
## Training Data
|
286 |
-
This model is trained on a mix of open-source and proprietary
|
287 |
-
* Phase 1: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
288 |
-
* Phase 2: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
289 |
|
290 |
## Infrastructure
|
291 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
|
|
283 |
|
284 |
<!-- TO DO: To be completed once the paper is ready -->
|
285 |
## Training Data
|
286 |
+
This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
|
287 |
+
* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
288 |
+
* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
289 |
|
290 |
## Infrastructure
|
291 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|