ibm-granite
/

granite-3.0-8b-base

Text Generation

Model card Files Files and versions Community

amezasor commited on 19 days ago

Commit

5dab846

•

1 Parent(s): d18e9b1

training data word choice fixe

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -276,10 +276,10 @@ Granite-3.0-8B-Base is based on a decoder-only dense transformer architecture. C
 | # Training tokens         | 12T      | **12T**      | 10T    | 10T    |
 **Training Data:**
-This model is trained on a mix of open source and proprietary data following a two-phase training strategy.
-* Stage 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
-* Stage 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 **Infrastructure:**
 We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.

 | # Training tokens         | 12T      | **12T**      | 10T    | 10T    |
 **Training Data:**
+This model is trained on a mix of open source and proprietary data following a two-stage training strategy.
+* Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
+* Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 **Infrastructure:**
 We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.