update: eval results and github repo link
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ model-index:
|
|
20 |
metrics:
|
21 |
- name: pass@1
|
22 |
type: pass@1
|
23 |
-
value: 65.
|
24 |
veriefied: false
|
25 |
- task:
|
26 |
type: text-generation
|
@@ -30,7 +30,7 @@ model-index:
|
|
30 |
metrics:
|
31 |
- name: pass@1
|
32 |
type: pass@1
|
33 |
-
value: 33.
|
34 |
veriefied: false
|
35 |
- task:
|
36 |
type: text-generation
|
@@ -50,7 +50,7 @@ model-index:
|
|
50 |
metrics:
|
51 |
- name: pass@1
|
52 |
type: pass@1
|
53 |
-
value:
|
54 |
veriefied: false
|
55 |
- task:
|
56 |
type: text-generation
|
@@ -90,7 +90,7 @@ model-index:
|
|
90 |
metrics:
|
91 |
- name: pass@1
|
92 |
type: pass@1
|
93 |
-
value:
|
94 |
veriefied: false
|
95 |
- task:
|
96 |
type: text-generation
|
@@ -130,7 +130,7 @@ model-index:
|
|
130 |
metrics:
|
131 |
- name: pass@1
|
132 |
type: pass@1
|
133 |
-
value:
|
134 |
veriefied: false
|
135 |
- task:
|
136 |
type: text-generation
|
@@ -150,7 +150,17 @@ model-index:
|
|
150 |
metrics:
|
151 |
- name: pass@1
|
152 |
type: pass@1
|
153 |
-
value:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
154 |
veriefied: false
|
155 |
- task:
|
156 |
type: text-generation
|
@@ -160,7 +170,7 @@ model-index:
|
|
160 |
metrics:
|
161 |
- name: pass@1
|
162 |
type: pass@1
|
163 |
-
value: 44
|
164 |
veriefied: false
|
165 |
- task:
|
166 |
type: text-generation
|
@@ -180,7 +190,7 @@ model-index:
|
|
180 |
metrics:
|
181 |
- name: pass@1
|
182 |
type: pass@1
|
183 |
-
value:
|
184 |
veriefied: false
|
185 |
- task:
|
186 |
type: text-generation
|
@@ -192,16 +202,6 @@ model-index:
|
|
192 |
type: pass@1
|
193 |
value: 29.28
|
194 |
veriefied: false
|
195 |
-
- task:
|
196 |
-
type: text-generation
|
197 |
-
dataset:
|
198 |
-
type: multilingual
|
199 |
-
name: MGSM
|
200 |
-
metrics:
|
201 |
-
- name: pass@1
|
202 |
-
type: pass@1
|
203 |
-
value: 51.60
|
204 |
-
veriefied: false
|
205 |
---
|
206 |
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
207 |
|
@@ -211,15 +211,15 @@ model-index:
|
|
211 |
**Granite-3.0-8B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-8B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
-
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
215 |
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
216 |
-
- **Paper:** [Granite Language Models](
|
217 |
- **Release Date**: October 21st, 2024
|
218 |
-
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
219 |
|
220 |
<!-- de/es/fr/ja/pt/ar/cs/it/ko/nl/zh -->
|
221 |
## Supported Languages
|
222 |
-
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
|
223 |
|
224 |
## Usage
|
225 |
### Intended use
|
@@ -258,8 +258,6 @@ output = tokenizer.batch_decode(output)
|
|
258 |
print(output)
|
259 |
```
|
260 |
|
261 |
-
<!-- ['Where is the MIT-IBM Watson AI Lab located?\n\nThe MIT-IBM Watson AI Lab is located in Cambridge, Massachusetts.\n\nWhat is the mission of the MIT-IBM Watson AI Lab?\n\nThe mission of the MIT-IBM Watson AI Lab is to advance the state of the art in artificial intelligence (AI) and machine learning (ML) through collaboration between MIT and IBM.\n\nWhat are some of the projects being worked on at the MIT-IBM Watson AI Lab?\n\nSome of the projects being worked on at the MIT-IBM Watson AI Lab include developing new AI and ML algorithms, applying AI and ML to real-world problems, and exploring the ethical implications of AI and ML.\n\nWhat is the significance of the MIT-IBM Watson AI Lab?<|endoftext|>'] -->
|
262 |
-
|
263 |
## Model Architeture
|
264 |
**Granite-3.0-8B-Base** is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
|
265 |
|
@@ -303,4 +301,4 @@ The use of Large Language Models involves risks and ethical considerations peopl
|
|
303 |
year = {2024},
|
304 |
url = {https://arxiv.org/abs/0000.00000},
|
305 |
}
|
306 |
-
```
|
|
|
20 |
metrics:
|
21 |
- name: pass@1
|
22 |
type: pass@1
|
23 |
+
value: 65.54
|
24 |
veriefied: false
|
25 |
- task:
|
26 |
type: text-generation
|
|
|
30 |
metrics:
|
31 |
- name: pass@1
|
32 |
type: pass@1
|
33 |
+
value: 33.27
|
34 |
veriefied: false
|
35 |
- task:
|
36 |
type: text-generation
|
|
|
50 |
metrics:
|
51 |
- name: pass@1
|
52 |
type: pass@1
|
53 |
+
value: 80.90
|
54 |
veriefied: false
|
55 |
- task:
|
56 |
type: text-generation
|
|
|
90 |
metrics:
|
91 |
- name: pass@1
|
92 |
type: pass@1
|
93 |
+
value: 83.61
|
94 |
veriefied: false
|
95 |
- task:
|
96 |
type: text-generation
|
|
|
130 |
metrics:
|
131 |
- name: pass@1
|
132 |
type: pass@1
|
133 |
+
value: 63.40
|
134 |
veriefied: false
|
135 |
- task:
|
136 |
type: text-generation
|
|
|
150 |
metrics:
|
151 |
- name: pass@1
|
152 |
type: pass@1
|
153 |
+
value: 49.31
|
154 |
+
veriefied: false
|
155 |
+
- task:
|
156 |
+
type: text-generation
|
157 |
+
dataset:
|
158 |
+
type: reasoning
|
159 |
+
name: MUSR
|
160 |
+
metrics:
|
161 |
+
- name: pass@1
|
162 |
+
type: pass@1
|
163 |
+
value: 41.08
|
164 |
veriefied: false
|
165 |
- task:
|
166 |
type: text-generation
|
|
|
170 |
metrics:
|
171 |
- name: pass@1
|
172 |
type: pass@1
|
173 |
+
value: 52.44
|
174 |
veriefied: false
|
175 |
- task:
|
176 |
type: text-generation
|
|
|
190 |
metrics:
|
191 |
- name: pass@1
|
192 |
type: pass@1
|
193 |
+
value: 64.06
|
194 |
veriefied: false
|
195 |
- task:
|
196 |
type: text-generation
|
|
|
202 |
type: pass@1
|
203 |
value: 29.28
|
204 |
veriefied: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
205 |
---
|
206 |
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
207 |
|
|
|
211 |
**Granite-3.0-8B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-8B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
212 |
|
213 |
- **Developers:** IBM Research
|
214 |
+
- **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
|
215 |
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
216 |
+
- **Paper:** [Granite 3.0 Language Models]()
|
217 |
- **Release Date**: October 21st, 2024
|
218 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
219 |
|
220 |
<!-- de/es/fr/ja/pt/ar/cs/it/ko/nl/zh -->
|
221 |
## Supported Languages
|
222 |
+
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
|
223 |
|
224 |
## Usage
|
225 |
### Intended use
|
|
|
258 |
print(output)
|
259 |
```
|
260 |
|
|
|
|
|
261 |
## Model Architeture
|
262 |
**Granite-3.0-8B-Base** is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
|
263 |
|
|
|
301 |
year = {2024},
|
302 |
url = {https://arxiv.org/abs/0000.00000},
|
303 |
}
|
304 |
+
```
|