Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
# Granite-3.1-8B-Base
|
10 |
|
11 |
**Model Summary:**
|
12 |
-
Granite-3.1-8B-Base extends the context length of Granite-3.0-8B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K.
|
13 |
|
14 |
- **Developers:** Granite Team, IBM
|
15 |
- **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
|
@@ -82,7 +82,8 @@ Granite-3.1-8B-Base is based on a decoder-only dense transformer architecture. C
|
|
82 |
This model is trained on a mix of open source and proprietary data following a three-stage training strategy.
|
83 |
* Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
84 |
* Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
85 |
-
* Stage 3 data:
|
|
|
86 |
|
87 |
A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 3.1 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf).
|
88 |
|
|
|
9 |
# Granite-3.1-8B-Base
|
10 |
|
11 |
**Model Summary:**
|
12 |
+
Granite-3.1-8B-Base extends the context length of Granite-3.0-8B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens.
|
13 |
|
14 |
- **Developers:** Granite Team, IBM
|
15 |
- **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
|
|
|
82 |
This model is trained on a mix of open source and proprietary data following a three-stage training strategy.
|
83 |
* Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
84 |
* Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
85 |
+
* Stage 3 data: The data for stage 3 consists of original stage-2 pretraining data with additional synthetic long-context data in form of QA/summary pairs where the answer
|
86 |
+
contains a recitation of the related paragraph before the answer.
|
87 |
|
88 |
A detailed attribution of datasets can be found in the [Granite 3.0 Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf), [Granite 3.1 Technical Report (coming soon)](https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d), and [Accompanying Author List](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/author-ack.pdf).
|
89 |
|