robinsmits
commited on
Commit
•
f792f3b
1
Parent(s):
b43b618
End of training
Browse files
README.md
CHANGED
@@ -1,161 +1,65 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
-
|
4 |
-
license: cc-by-nc-4.0
|
5 |
-
library_name: peft
|
6 |
tags:
|
7 |
- generated_from_trainer
|
8 |
-
- alpaca
|
9 |
-
- Transformers
|
10 |
-
- PolyLM
|
11 |
-
- text-generation-inference
|
12 |
-
datasets:
|
13 |
-
- BramVanroy/alpaca-cleaned-dutch
|
14 |
-
inference: false
|
15 |
-
base_model: DAMO-NLP-MT/polylm-13b
|
16 |
-
pipeline_tag: text-generation
|
17 |
model-index:
|
18 |
- name: polylm_13b_ft_alpaca_clean_dutch
|
19 |
results: []
|
20 |
---
|
21 |
|
22 |
-
|
|
|
23 |
|
24 |
-
|
25 |
|
26 |
-
This
|
27 |
It achieves the following results on the evaluation set:
|
28 |
-
- Loss: 1.
|
29 |
-
|
30 |
-
Finetuning was performed on the Dutch [BramVanroy/alpaca-cleaned-dutch](https://www.huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) dataset which contains 52K of records with instruction following-data translated from English to Dutch.
|
31 |
-
|
32 |
-
See [DAMO-NLP-MT/polylm-13b](https://huggingface.co/DAMO-NLP-MT/polylm-13b) for all information about the base model.
|
33 |
-
|
34 |
-
## Model usage
|
35 |
-
|
36 |
-
A basic example of how to use the finetuned model.
|
37 |
-
|
38 |
-
```
|
39 |
-
import torch
|
40 |
-
from peft import AutoPeftModelForCausalLM
|
41 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
42 |
-
|
43 |
-
model_name = "robinsmits/polylm_13b_ft_alpaca_clean_dutch"
|
44 |
-
|
45 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False, legacy = False)
|
46 |
-
|
47 |
-
model = AutoPeftModelForCausalLM.from_pretrained(model_name, device_map = "auto", load_in_4bit = True, torch_dtype = torch.bfloat16)
|
48 |
-
|
49 |
-
prompt = "### Instructie:\nWat zijn de drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling?\n\n### Antwoord:\n"
|
50 |
-
|
51 |
-
inputs = tokenizer(prompt, return_tensors = "pt")
|
52 |
-
sample = model.generate(input_ids = inputs.input_ids.cuda(),
|
53 |
-
attention_mask = inputs.attention_mask.cuda(),
|
54 |
-
max_new_tokens = 128,
|
55 |
-
do_sample = True,
|
56 |
-
top_p = 0.85,
|
57 |
-
top_k = 50,
|
58 |
-
temperature = 0.5,
|
59 |
-
repetition_penalty = 1.2,
|
60 |
-
length_penalty = -1.0,
|
61 |
-
num_return_sequences = 1,
|
62 |
-
pad_token_id = tokenizer.eos_token_id,
|
63 |
-
forced_eos_token_id = tokenizer.eos_token_id)
|
64 |
-
output = tokenizer.decode(sample[0], skip_special_tokens = True)
|
65 |
-
|
66 |
-
print(output.split(prompt)[1])
|
67 |
-
```
|
68 |
|
69 |
-
|
70 |
-
|
71 |
-
```
|
72 |
-
### Instructie:
|
73 |
-
Wat zijn de drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling?
|
74 |
-
|
75 |
-
### Antwoord:
|
76 |
-
|
77 |
-
De drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling, zijn HTML (HyperText Markup Language), CSS (Cascading Style Sheets) en JavaScript. Deze onderdelen stellen gebruikers in staat om inhoud op een website te creëren of aanpassen met behulp van codering. Bovendien kunnen ze interactieve elementen zoals animatie, video's en audio-opnames toevoegen aan websites. HTML is het meest voorkomende onderdeel omdat deze de basis vormt voor alle andere componenten. Het stelt ontwikkelaars in staat om tekst en afbeeldingen op hun pagina's weer te geven door gebruik te maken van markup tags
|
78 |
-
```
|
79 |
|
80 |
-
|
81 |
|
82 |
## Intended uses & limitations
|
83 |
|
84 |
-
|
85 |
-
Dutch was one of those 18 languages. For training the model a diverse combination of multi-lingual datasets was used.
|
86 |
-
|
87 |
-
The generated output and performance of this model for the Dutch language is very likely not always comparable to the various Open-Llama models that have been finetuned on English Alpaca datasets.
|
88 |
-
|
89 |
-
The primary intention of this finetuned model is to explore and research the use of the Dutch language in combination with an Open LLM model.
|
90 |
-
|
91 |
-
## Bias, Risks, and Limitations
|
92 |
-
|
93 |
-
The information below is copied from the base model's [official model card](https://arxiv.org/pdf/2307.06018.pdf).
|
94 |
-
This applies also to the finetuned model.
|
95 |
-
|
96 |
-
> Our contributions are fully methodological: adding the support of multilingualism to LLM during training and SFT phases. It is unavoidable that PolyLM might exhibit several common deficiencies of language models, e.g. hallucination and toxicity. PolyLM should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
|
97 |
|
98 |
## Training and evaluation data
|
99 |
|
100 |
-
|
101 |
-
|
102 |
-
The dataset is the Dutch translation of the English Alpaca Cleaned instruction dataset.
|
103 |
-
|
104 |
-
Based on the dataset license only Non-Commercial use is allowed. Commercial use is strictly forbidden.
|
105 |
|
106 |
## Training procedure
|
107 |
|
108 |
-
This model was finetuned with a QLoRA setup on a Google Colab A100 GPU in about 7.0 hours.
|
109 |
-
|
110 |
-
The notebook used for training can be found here: [Training Notebook](https://github.com/RobinSmits/Dutch-LLMs/blob/main/PolyLM_13B_Alpaca_Clean_Dutch_Qlora.ipynb)
|
111 |
-
|
112 |
### Training hyperparameters
|
113 |
|
114 |
The following hyperparameters were used during training:
|
115 |
-
- learning_rate:
|
116 |
-
- train_batch_size:
|
117 |
- eval_batch_size: 8
|
118 |
- seed: 42
|
119 |
-
- gradient_accumulation_steps:
|
120 |
- total_train_batch_size: 64
|
121 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
122 |
- lr_scheduler_type: linear
|
123 |
- lr_scheduler_warmup_steps: 64
|
124 |
-
- num_epochs:
|
125 |
-
|
126 |
-
The following bitsandbytes quantization config was used during training:
|
127 |
-
- load_in_8bit: False
|
128 |
-
- load_in_4bit: True
|
129 |
-
- llm_int8_threshold: 6.0
|
130 |
-
- llm_int8_skip_modules: None
|
131 |
-
- llm_int8_enable_fp32_cpu_offload: False
|
132 |
-
- llm_int8_has_fp16_weight: False
|
133 |
-
- bnb_4bit_quant_type: nf4
|
134 |
-
- bnb_4bit_use_double_quant: True
|
135 |
-
- bnb_4bit_compute_dtype: bfloat16
|
136 |
-
-
|
137 |
### Training results
|
138 |
|
139 |
| Training Loss | Epoch | Step | Validation Loss |
|
140 |
|:-------------:|:-----:|:----:|:---------------:|
|
141 |
-
| 1.
|
142 |
-
| 1.
|
143 |
-
| 1.
|
144 |
-
| 1.
|
145 |
-
| 1.
|
146 |
-
| 1.
|
147 |
-
| 1.3534 | 1.15 | 896 | 1.3599 |
|
148 |
-
| 1.3334 | 1.32 | 1024 | 1.3535 |
|
149 |
-
| 1.3351 | 1.48 | 1152 | 1.3475 |
|
150 |
-
| 1.3178 | 1.65 | 1280 | 1.3411 |
|
151 |
-
| 1.3341 | 1.81 | 1408 | 1.3378 |
|
152 |
-
| 1.2976 | 1.98 | 1536 | 1.3355 |
|
153 |
|
154 |
|
155 |
### Framework versions
|
156 |
|
157 |
-
- Transformers 4.
|
158 |
- Pytorch 2.0.1+cu118
|
159 |
-
- Datasets 2.14.
|
160 |
-
- Tokenizers 0.
|
161 |
-
- PEFT 0.4.0
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model: DAMO-NLP-MT/polylm-13b-fine-grained-shards
|
|
|
|
|
4 |
tags:
|
5 |
- generated_from_trainer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
model-index:
|
7 |
- name: polylm_13b_ft_alpaca_clean_dutch
|
8 |
results: []
|
9 |
---
|
10 |
|
11 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
+
should probably proofread and complete it, then remove this comment. -->
|
13 |
|
14 |
+
# polylm_13b_ft_alpaca_clean_dutch
|
15 |
|
16 |
+
This model is a fine-tuned version of [DAMO-NLP-MT/polylm-13b-fine-grained-shards](https://huggingface.co/DAMO-NLP-MT/polylm-13b-fine-grained-shards) on an unknown dataset.
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 1.3839
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
+
## Model description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
+
More information needed
|
23 |
|
24 |
## Intended uses & limitations
|
25 |
|
26 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
## Training and evaluation data
|
29 |
|
30 |
+
More information needed
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Training procedure
|
33 |
|
|
|
|
|
|
|
|
|
34 |
### Training hyperparameters
|
35 |
|
36 |
The following hyperparameters were used during training:
|
37 |
+
- learning_rate: 5e-05
|
38 |
+
- train_batch_size: 4
|
39 |
- eval_batch_size: 8
|
40 |
- seed: 42
|
41 |
+
- gradient_accumulation_steps: 16
|
42 |
- total_train_batch_size: 64
|
43 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
44 |
- lr_scheduler_type: linear
|
45 |
- lr_scheduler_warmup_steps: 64
|
46 |
+
- num_epochs: 1
|
47 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
### Training results
|
49 |
|
50 |
| Training Loss | Epoch | Step | Validation Loss |
|
51 |
|:-------------:|:-----:|:----:|:---------------:|
|
52 |
+
| 1.4626 | 0.16 | 128 | 1.4613 |
|
53 |
+
| 1.4027 | 0.33 | 256 | 1.4235 |
|
54 |
+
| 1.4002 | 0.49 | 384 | 1.4054 |
|
55 |
+
| 1.3857 | 0.66 | 512 | 1.3951 |
|
56 |
+
| 1.3798 | 0.82 | 640 | 1.3870 |
|
57 |
+
| 1.3629 | 0.99 | 768 | 1.3839 |
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
|
60 |
### Framework versions
|
61 |
|
62 |
+
- Transformers 4.34.0
|
63 |
- Pytorch 2.0.1+cu118
|
64 |
+
- Datasets 2.14.5
|
65 |
+
- Tokenizers 0.14.1
|
|