Update README.md
Browse files
README.md
CHANGED
@@ -8,30 +8,30 @@ pipeline_tag: text-generation
|
|
8 |
|
9 |
# Model Card for OPT_GaMS-1B-Chat
|
10 |
|
11 |
-
|
12 |
|
13 |
-
|
14 |
|
15 |
-
##
|
16 |
|
17 |
-
The model was
|
18 |
-
| Corpus | Language | # Tokens | Percentage |
|
19 |
-
| :----- | :------- | :------: | :--------: |
|
20 |
-
| Metafida | Slovene | 6.59 B | 13.89 % |
|
21 |
-
| KAS | Slovene | 3.61 B | 7.62 % |
|
22 |
-
| Trendi | Slovene | 1.4 B | 2.96 % |
|
23 |
-
| mC4 | Slovene | 5.5 B | 11.6 % |
|
24 |
-
| MaCoCu | Slovene | 4.68 B | 9.86 % |
|
25 |
-
| CC100 | Slovene | 0.54 B | 1.14 % |
|
26 |
-
| Rižnica | Croatian | 0.21 B | 0.44 % |
|
27 |
-
| Hr News | Croatian | 4.16 B | 8.77 % |
|
28 |
-
| MaCoCu HBS | CBS | 15.65 B | 32.98 % |
|
29 |
-
| Wikipedia | English | 4.7 B | 9.9 % |
|
30 |
-
| CC-News | English | 0.4 B | 0.83 % |
|
31 |
|
32 |
-
|
33 |
|
34 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
The inference can be done using the following snippet of code:
|
37 |
|
@@ -58,191 +58,84 @@ response = pipeline(new_message, max_length=500)
|
|
58 |
print("Model's response:", response[0]["generated_text"][-1]["content"])
|
59 |
```
|
60 |
|
61 |
-
##
|
62 |
-
|
63 |
-
### Model Description
|
64 |
-
|
65 |
-
<!-- Provide a longer summary of what this model is. -->
|
66 |
-
|
67 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
68 |
-
|
69 |
-
- **Developed by:** [More Information Needed]
|
70 |
-
- **Funded by [optional]:** [More Information Needed]
|
71 |
-
- **Shared by [optional]:** [More Information Needed]
|
72 |
-
- **Model type:** [More Information Needed]
|
73 |
-
- **Language(s) (NLP):** [More Information Needed]
|
74 |
-
- **License:** [More Information Needed]
|
75 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
76 |
-
|
77 |
-
### Model Sources [optional]
|
78 |
-
|
79 |
-
<!-- Provide the basic links for the model. -->
|
80 |
-
|
81 |
-
- **Repository:** [More Information Needed]
|
82 |
-
- **Paper [optional]:** [More Information Needed]
|
83 |
-
- **Demo [optional]:** [More Information Needed]
|
84 |
-
|
85 |
-
## Uses
|
86 |
-
|
87 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
88 |
-
|
89 |
-
### Direct Use
|
90 |
|
91 |
-
|
92 |
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
## Bias, Risks, and Limitations
|
108 |
-
|
109 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
110 |
-
|
111 |
-
[More Information Needed]
|
112 |
-
|
113 |
-
### Recommendations
|
114 |
-
|
115 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
116 |
-
|
117 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
118 |
-
|
119 |
-
## How to Get Started with the Model
|
120 |
-
|
121 |
-
Use the code below to get started with the model.
|
122 |
-
|
123 |
-
[More Information Needed]
|
124 |
-
|
125 |
-
## Training Details
|
126 |
-
|
127 |
-
### Training Data
|
128 |
-
|
129 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
130 |
|
131 |
-
|
132 |
|
133 |
### Training Procedure
|
134 |
|
135 |
-
|
136 |
-
|
137 |
-
#### Preprocessing [optional]
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
|
142 |
-
|
143 |
|
144 |
-
|
145 |
|
146 |
-
|
147 |
-
|
148 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
149 |
-
|
150 |
-
[More Information Needed]
|
151 |
|
152 |
## Evaluation
|
153 |
|
154 |
-
|
155 |
-
|
156 |
-
###
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
|
189 |
-
|
190 |
-
|
191 |
-
|
192 |
-
|
193 |
-
|
194 |
-
|
195 |
-
|
196 |
-
|
197 |
-
- **
|
198 |
-
|
199 |
-
-
|
200 |
-
|
201 |
-
|
202 |
-
## Technical Specifications [optional]
|
203 |
-
|
204 |
-
### Model Architecture and Objective
|
205 |
-
|
206 |
-
[More Information Needed]
|
207 |
-
|
208 |
-
### Compute Infrastructure
|
209 |
-
|
210 |
-
[More Information Needed]
|
211 |
-
|
212 |
-
#### Hardware
|
213 |
-
|
214 |
-
[More Information Needed]
|
215 |
-
|
216 |
-
#### Software
|
217 |
-
|
218 |
-
[More Information Needed]
|
219 |
-
|
220 |
-
## Citation [optional]
|
221 |
-
|
222 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
223 |
-
|
224 |
-
**BibTeX:**
|
225 |
-
|
226 |
-
[More Information Needed]
|
227 |
-
|
228 |
-
**APA:**
|
229 |
-
|
230 |
-
[More Information Needed]
|
231 |
-
|
232 |
-
## Glossary [optional]
|
233 |
-
|
234 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
235 |
-
|
236 |
-
[More Information Needed]
|
237 |
-
|
238 |
-
## More Information [optional]
|
239 |
-
|
240 |
-
[More Information Needed]
|
241 |
-
|
242 |
-
## Model Card Authors [optional]
|
243 |
-
|
244 |
-
[More Information Needed]
|
245 |
-
|
246 |
-
## Model Card Contact
|
247 |
-
|
248 |
-
[More Information Needed]
|
|
|
8 |
|
9 |
# Model Card for OPT_GaMS-1B-Chat
|
10 |
|
11 |
+
We proudly present the familly of GaMS (Generative Model for Slovene) models. The 1B version is based on [Facebook's OPT model](https://huggingface.co/facebook/opt-1.3b) and is adapted for Slovene. OPT_GaMS models use original OPT tokenizer. This is the instruction-tuned version of the model.
|
12 |
|
13 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/YnvcP1x7CHH-eTY2oB-69.png)
|
14 |
|
15 |
+
## Acknowledgment
|
16 |
|
17 |
+
The model was developed within the [PoVeJMo](https://povejmo.si) research program (Adaptive Natural Language Processing with Large Language Models}; Prilagodljiva obdelava naravnega jezika s pomočjo velikih jezikovnih modelov), particularly within the research project titled SloLLaMai -- Open-access computationally efficient models for Slovenian, funded within the Recovery and Resilience Plan (NOO; Načrt za okrevanje in odpornost) by the Slovenian Research and Innovation Agency (ARIS) and NextGenerationEU. The authors also acknowledge the financial support from the Slovenian Research and Innovation Agency (research core funding No. P6-0411 -- Language Resources and Technologies for Slovene).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
+
We thank everyone, who worked on data collection and preparation, enabling us to train our model. Special thanks goes to: Nikola Ljubešić, Tjaša Arčon, Jaka Čibej, Simon Krek, Tomaž Erjavec and Iztok Kosem.
|
20 |
|
21 |
+
## Basic information
|
22 |
+
|
23 |
+
- **Developed by:** team of researchers at University of Ljubljana, Faculty for Computer and Information Science and XLAB.doo. Team members: Domen Vreš, Martin Božič, Aljaž Potočnik, Tomaž Martinčič, Iztok Lebar Bajec, Timotej Petrič and Marko Robnik-Šikonja.
|
24 |
+
- **Language:** Slovene
|
25 |
+
- **License:** Apache 2.0
|
26 |
+
- **Repository:** https://github.com/SloLama/NeMo
|
27 |
+
- **Paper:** https://www.sdjt.si/wp/wp-content/uploads/2024/09/JT-DH-2024_Vres_Bozic_Potocnik_Martincic_Robnik.pdf
|
28 |
+
|
29 |
+
## Intended usage
|
30 |
+
This version of the model is quite small and lacks safety tuning. Hence, using it as a general purpose model is **STRONGLY DISCOURAGED!!!** The model might also contain certain biases. We do not recommend usage of this model in any other language than Slovene.
|
31 |
+
|
32 |
+
The model can be efficiently tuned for specific use cases as suggested by promising results of fine-tuned models on SuperGLUE and SI-NLI benchmarks
|
33 |
+
|
34 |
+
## How to get started with the model
|
35 |
|
36 |
The inference can be done using the following snippet of code:
|
37 |
|
|
|
58 |
print("Model's response:", response[0]["generated_text"][-1]["content"])
|
59 |
```
|
60 |
|
61 |
+
## Training details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
+
### Training data
|
64 |
|
65 |
+
The model was additionally pretrained on the following Slovene, English, and Croatian-Bosnian-Serbian (CBS) corpora:
|
66 |
+
| Corpus | Language | # Tokens | Percentage |
|
67 |
+
| :----- | :------- | :------: | :--------: |
|
68 |
+
| Metafida | Slovene | 6.59 B | 13.89 % |
|
69 |
+
| KAS | Slovene | 3.61 B | 7.62 % |
|
70 |
+
| Trendi | Slovene | 1.4 B | 2.96 % |
|
71 |
+
| mC4 | Slovene | 5.5 B | 11.6 % |
|
72 |
+
| MaCoCu | Slovene | 4.68 B | 9.86 % |
|
73 |
+
| CC100 | Slovene | 0.54 B | 1.14 % |
|
74 |
+
| Rižnica | Croatian | 0.21 B | 0.44 % |
|
75 |
+
| Hr News | Croatian | 4.16 B | 8.77 % |
|
76 |
+
| MaCoCu HBS | CBS | 15.65 B | 32.98 % |
|
77 |
+
| Wikipedia | English | 4.7 B | 9.9 % |
|
78 |
+
| CC-News | English | 0.4 B | 0.83 % |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
|
80 |
+
The total size of additional training data is **47.44 B** tokens.
|
81 |
|
82 |
### Training Procedure
|
83 |
|
84 |
+
The model was trained using NeMo framework on Slovene HPC Vega, utilizing 64 A100 GPUs at once. Training took approximately 16 hours. The model was trained with batch size 1024 (2 million tokens) using Adam optimizer and cosine learning rate scheduler with 1000 warmup and constant steps.
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
+
### Supervised Finetuning (SFT)
|
87 |
|
88 |
+
The model was trained on [GaMS-Instruct](http://hdl.handle.net/11356/1971) dataset (20.000 examples). The currated version of the dataset (7.000 examples) is publicly available. 19.050 examples were used as a training set and 950 examples were used as a validation set.
|
89 |
|
90 |
+
The model was LoRA tuned on 7 epochs with rank 1024. The model was trained with batch size 64 using Adam optimizer and cosine learning rate scheduler with 300 warmup steps.
|
|
|
|
|
|
|
|
|
91 |
|
92 |
## Evaluation
|
93 |
|
94 |
+
The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
|
95 |
+
|
96 |
+
### SuperGLUE results
|
97 |
+
|
98 |
+
| Model | SuperGLUE Average | BoolQ Accuracy | CB Accuracy | CB F1 Score | CB Average | COPA Accuracy | MultiRC EM | MultiRC F1a Score | MultiRC Average | RTE Accuracy | WSC Accuracy |
|
99 |
+
| :---- | :---------------: | :------------: | :---------: | :---------: | :--------: | :-----------: | :--------: | :---------------: | :-------------: | :----------: | :----------: |
|
100 |
+
| OPT_GaMS-1B | 0.4408 | 0.5667 | 0.5040 | 0.3885 | 0.4463 | 0.5020 | 0.0961 | 0.2543 | 0.1752 | 0.4138 | 0.5411 |
|
101 |
+
| GaMS-1B | 0.4604 | 0.5000 | 0.6200 | 0.4565 | 0.5382 | 0.4920 | 0.1351 | 0.2675 | 0.2013 | 0.4828 | 0.5479 |
|
102 |
+
| OPT_GaMS-1B-Chat | 0.4165 | 0.7000 | 0.3720 | 0.2961 | 0.3341 | 0.4600 | 0.1111 | 0.3448 | 0.2280 | 0.4138 | 0.3630 |
|
103 |
+
| GaMS-1B-Chat | 0.4570 | **0.8000** | 0.4880 | 0.3023 | 0.3951 | 0.4840 | 0.1081 | 0.2428 | 0.1755 | 0.5172 | 0.3699 |
|
104 |
+
| OPT_GaMS-1B-Chat finetuned | 0.5645 | 0.7000 | 0.8040 | 0.5884 | 0.6962 | 0.5860 | 0.1021 | 0.4808 | 0.2914 | 0.5862 | 0.5274 |
|
105 |
+
| GaMS-1B-Chat finetuned | 0.5806 | 0.7333 | **0.8120** | 0.5592 | 0.6856 | 0.5080 | 0.1381 | 0.4882 | 0.3132 | 0.5862 | **0.6575** |
|
106 |
+
| SlovenianGPT-Chat* | 0.5078 | 0.7333 | 0.3920 | 0.3829 | 0.3874 | **0.6840** | **0.2432** | 0.4944 | **0.3688** | 0.5172 | 0.3562 |
|
107 |
+
| CroSloEngual BERT | **0.6078** | 0.7333 | 0.7920 | **0.7437** | **0.7679** | 0.5720 | 0.0931 | **0.5241** | 0.3086 | **0.6552** | 0.6096 |
|
108 |
+
|
109 |
+
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
|
110 |
+
|
111 |
+
### SI-NLI results
|
112 |
+
|
113 |
+
| Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
|
114 |
+
| :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
|
115 |
+
| OPT_GaMS-1B | 0.3277 | 0.3407 | 0.6754 | 0.4529 | 0.3538 | 0.1402 | 0.2009 | 0.2632 | 0.1524 | 0.1931 |
|
116 |
+
| GaMS-1B | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
|
117 |
+
| OPT_GaMS-1B-Chat | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
|
118 |
+
| GaMS-1B-Chat | 0.3417 | 0.3405 | **0.9737** | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
|
119 |
+
| OPT_GaMS-1B-Chat finetuned | 0.7244 | 0.7065 | 0.8304 | 0.7634 | 0.7269 | 0.6006 | 0.6578 | 0.7446 | 0.7378 | 0.7412 |
|
120 |
+
| GaMS-1B-Chat finetuned | 0.7144 | 0.8037 | 0.6345 | 0.7092 | 0.7247 | 0.6341 | 0.6764 | 0.6531 | **0.8780** | 0.7490 |
|
121 |
+
| SlovenianGPT-Chat* | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
|
122 |
+
| GPT-3.5-Turbo finetuned | **0.8567** | **0.8464** | 0.8538 | **0.8501** | **0.8041** | **0.8384** | **0.8209** | **0.9260** | **0.8780** | **0.9014** |
|
123 |
+
| SloBERTa | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
|
124 |
+
| CroSloEngual BERT | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
|
125 |
+
|
126 |
+
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
|
127 |
+
|
128 |
+
### Slovenian-LLM-eval results
|
129 |
+
| Model | ARC-Challenge Accuracy | ARC-Easy Accuracy | BoolQ Accuracy | HellaSwag Accuracy | NQ-Open EM | OpenBookQA Accuracy | PIQA Accuracy | WinoGrande Accuracy |
|
130 |
+
| :---- | :--------------------: | :---------------: | :------------: | :----------------: | :--------------: | :-----------------: | :-----------: | :-----------------: |
|
131 |
+
| OPT_GaMS-1B | 0.2227 ± 0.0122 | 0.436 ± 0.0102 | 0.378 ± 0.0085 | 0.3394 ± 0.0047 | 0.0003 ± 0.0003 | 0.214 ± 0.0184 | 0.6083 ± 0.0114 | 0.5533 ± 0.014 |
|
132 |
+
| GaMS-1B | 0.2329 ± 0.0124 | 0.4743 ± 0.0102 | 0.3813 ± 0.0085 | 0.3555 ± 0.0048 | 0.0036 ± 0.001 | 0.22 ± 0.0185 | 0.624 ± 0.0113 | 0.532 ± 0.014 |
|
133 |
+
| OPT_GaMS-1B-Chat | 0.2355 ± 0.0124 | 0.3960 ± 0.0100 | 0.4398 ± 0.0087 | 0.3459 ± 0.0047 | 0.0011 ± 0.0006 | 0.20 ± 0.0179 | 0.5778 ± 0.0115 | 0.5359 ± 0.014 |
|
134 |
+
| GaMS-1B-Chat | 0.2517 ± 0.0127 | 0.4394 ± 0.0102 | 0.4502 ± 0.0087 | 0.3634 ± 0.0048 | 0 ± 0 | 0.196 ± 0.0178 | 0.6115 ± 0.0114 | 0.5572 ± 0.014 |
|
135 |
+
| YugoGPT | 0.2961 ± 0.0133 | 0.4781 ± 0.0102 | 0.3783 ± 0.0085 | 0.3890 ± 0.0047 | 0.0385 ± 0.0032 | 0.226 ± 0.0187 | 0.5816 ± 0.0115 | 0.5588 ± 0.014 |
|
136 |
+
| SlovenianGPT | **0.3805 ± 0.0142** | **0.6498 ± 0.0098** | 0.4523 ± 0.0087 | **0.4935 ± 0.0050** | **0.0432 ± 0.0034** | **0.27 ± 0.0199** | **0.6937 ± 0.0108** | **0.644 ± 0.0135** |
|
137 |
+
| SlovenianGPT-Chat* | 0.3567 ± 0.014 | 0.5901 ± 0.0101 | **0.4706 ± 0.0087** | 0.4719 ± 0.0050 | 0.0003 ± 0.0003 | **0.27 ± 0.0199** | 0.6861 ± 0.0108 | 0.6425 ± 0.0135 |
|
138 |
+
|
139 |
+
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
|
140 |
+
|
141 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/_2h977RjIu0nI_IJG_9bL.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|