robinsmits commited on
Commit
f792f3b
1 Parent(s): b43b618

End of training

Browse files
Files changed (1) hide show
  1. README.md +25 -121
README.md CHANGED
@@ -1,161 +1,65 @@
1
  ---
2
- language:
3
- - nl
4
- license: cc-by-nc-4.0
5
- library_name: peft
6
  tags:
7
  - generated_from_trainer
8
- - alpaca
9
- - Transformers
10
- - PolyLM
11
- - text-generation-inference
12
- datasets:
13
- - BramVanroy/alpaca-cleaned-dutch
14
- inference: false
15
- base_model: DAMO-NLP-MT/polylm-13b
16
- pipeline_tag: text-generation
17
  model-index:
18
  - name: polylm_13b_ft_alpaca_clean_dutch
19
  results: []
20
  ---
21
 
22
- # polylm_13b_ft_alpaca_clean_dutch
 
23
 
24
- ## Model description
25
 
26
- This adapter model is a fine-tuned version of [DAMO-NLP-MT/polylm-13b](https://huggingface.co/DAMO-NLP-MT/polylm-13b).
27
  It achieves the following results on the evaluation set:
28
- - Loss: 1.3355
29
-
30
- Finetuning was performed on the Dutch [BramVanroy/alpaca-cleaned-dutch](https://www.huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) dataset which contains 52K of records with instruction following-data translated from English to Dutch.
31
-
32
- See [DAMO-NLP-MT/polylm-13b](https://huggingface.co/DAMO-NLP-MT/polylm-13b) for all information about the base model.
33
-
34
- ## Model usage
35
-
36
- A basic example of how to use the finetuned model.
37
-
38
- ```
39
- import torch
40
- from peft import AutoPeftModelForCausalLM
41
- from transformers import AutoModelForCausalLM, AutoTokenizer
42
-
43
- model_name = "robinsmits/polylm_13b_ft_alpaca_clean_dutch"
44
-
45
- tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False, legacy = False)
46
-
47
- model = AutoPeftModelForCausalLM.from_pretrained(model_name, device_map = "auto", load_in_4bit = True, torch_dtype = torch.bfloat16)
48
-
49
- prompt = "### Instructie:\nWat zijn de drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling?\n\n### Antwoord:\n"
50
-
51
- inputs = tokenizer(prompt, return_tensors = "pt")
52
- sample = model.generate(input_ids = inputs.input_ids.cuda(),
53
- attention_mask = inputs.attention_mask.cuda(),
54
- max_new_tokens = 128,
55
- do_sample = True,
56
- top_p = 0.85,
57
- top_k = 50,
58
- temperature = 0.5,
59
- repetition_penalty = 1.2,
60
- length_penalty = -1.0,
61
- num_return_sequences = 1,
62
- pad_token_id = tokenizer.eos_token_id,
63
- forced_eos_token_id = tokenizer.eos_token_id)
64
- output = tokenizer.decode(sample[0], skip_special_tokens = True)
65
-
66
- print(output.split(prompt)[1])
67
- ```
68
 
69
- The prompt and generated output for the above mentioned example is similar to the output shown below.
70
-
71
- ```
72
- ### Instructie:
73
- Wat zijn de drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling?
74
-
75
- ### Antwoord:
76
-
77
- De drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling, zijn HTML (HyperText Markup Language), CSS (Cascading Style Sheets) en JavaScript. Deze onderdelen stellen gebruikers in staat om inhoud op een website te creëren of aanpassen met behulp van codering. Bovendien kunnen ze interactieve elementen zoals animatie, video's en audio-opnames toevoegen aan websites. HTML is het meest voorkomende onderdeel omdat deze de basis vormt voor alle andere componenten. Het stelt ontwikkelaars in staat om tekst en afbeeldingen op hun pagina's weer te geven door gebruik te maken van markup tags
78
- ```
79
 
80
- For more extensive usage and a lot of generated samples (both good and bad samples) see the following [Inference Notebook](https://github.com/RobinSmits/Dutch-LLMs/blob/main/PolyLM_13B_Alpaca_Clean_Dutch_Inference.ipynb)
81
 
82
  ## Intended uses & limitations
83
 
84
- The PolyLM-13B model was trained on 18 languages. The primary focus was to create a multi-lingual Open LLM.
85
- Dutch was one of those 18 languages. For training the model a diverse combination of multi-lingual datasets was used.
86
-
87
- The generated output and performance of this model for the Dutch language is very likely not always comparable to the various Open-Llama models that have been finetuned on English Alpaca datasets.
88
-
89
- The primary intention of this finetuned model is to explore and research the use of the Dutch language in combination with an Open LLM model.
90
-
91
- ## Bias, Risks, and Limitations
92
-
93
- The information below is copied from the base model's [official model card](https://arxiv.org/pdf/2307.06018.pdf).
94
- This applies also to the finetuned model.
95
-
96
- > Our contributions are fully methodological: adding the support of multilingualism to LLM during training and SFT phases. It is unavoidable that PolyLM might exhibit several common deficiencies of language models, e.g. hallucination and toxicity. PolyLM should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
97
 
98
  ## Training and evaluation data
99
 
100
- This model was trained on the [BramVanroy/alpaca-cleaned-dutch](https://www.huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) dataset.
101
-
102
- The dataset is the Dutch translation of the English Alpaca Cleaned instruction dataset.
103
-
104
- Based on the dataset license only Non-Commercial use is allowed. Commercial use is strictly forbidden.
105
 
106
  ## Training procedure
107
 
108
- This model was finetuned with a QLoRA setup on a Google Colab A100 GPU in about 7.0 hours.
109
-
110
- The notebook used for training can be found here: [Training Notebook](https://github.com/RobinSmits/Dutch-LLMs/blob/main/PolyLM_13B_Alpaca_Clean_Dutch_Qlora.ipynb)
111
-
112
  ### Training hyperparameters
113
 
114
  The following hyperparameters were used during training:
115
- - learning_rate: 2e-05
116
- - train_batch_size: 8
117
  - eval_batch_size: 8
118
  - seed: 42
119
- - gradient_accumulation_steps: 8
120
  - total_train_batch_size: 64
121
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
122
  - lr_scheduler_type: linear
123
  - lr_scheduler_warmup_steps: 64
124
- - num_epochs: 2
125
-
126
- The following bitsandbytes quantization config was used during training:
127
- - load_in_8bit: False
128
- - load_in_4bit: True
129
- - llm_int8_threshold: 6.0
130
- - llm_int8_skip_modules: None
131
- - llm_int8_enable_fp32_cpu_offload: False
132
- - llm_int8_has_fp16_weight: False
133
- - bnb_4bit_quant_type: nf4
134
- - bnb_4bit_use_double_quant: True
135
- - bnb_4bit_compute_dtype: bfloat16
136
- -
137
  ### Training results
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:-----:|:----:|:---------------:|
141
- | 1.4311 | 0.16 | 128 | 1.4541 |
142
- | 1.3936 | 0.33 | 256 | 1.4141 |
143
- | 1.423 | 0.49 | 384 | 1.3960 |
144
- | 1.3672 | 0.66 | 512 | 1.3832 |
145
- | 1.3809 | 0.82 | 640 | 1.3754 |
146
- | 1.3581 | 0.99 | 768 | 1.3652 |
147
- | 1.3534 | 1.15 | 896 | 1.3599 |
148
- | 1.3334 | 1.32 | 1024 | 1.3535 |
149
- | 1.3351 | 1.48 | 1152 | 1.3475 |
150
- | 1.3178 | 1.65 | 1280 | 1.3411 |
151
- | 1.3341 | 1.81 | 1408 | 1.3378 |
152
- | 1.2976 | 1.98 | 1536 | 1.3355 |
153
 
154
 
155
  ### Framework versions
156
 
157
- - Transformers 4.31.0
158
  - Pytorch 2.0.1+cu118
159
- - Datasets 2.14.0
160
- - Tokenizers 0.13.3
161
- - PEFT 0.4.0
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: DAMO-NLP-MT/polylm-13b-fine-grained-shards
 
 
4
  tags:
5
  - generated_from_trainer
 
 
 
 
 
 
 
 
 
6
  model-index:
7
  - name: polylm_13b_ft_alpaca_clean_dutch
8
  results: []
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # polylm_13b_ft_alpaca_clean_dutch
15
 
16
+ This model is a fine-tuned version of [DAMO-NLP-MT/polylm-13b-fine-grained-shards](https://huggingface.co/DAMO-NLP-MT/polylm-13b-fine-grained-shards) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 1.3839
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ ## Model description
 
 
 
 
 
 
 
 
 
21
 
22
+ More information needed
23
 
24
  ## Intended uses & limitations
25
 
26
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Training and evaluation data
29
 
30
+ More information needed
 
 
 
 
31
 
32
  ## Training procedure
33
 
 
 
 
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
37
+ - learning_rate: 5e-05
38
+ - train_batch_size: 4
39
  - eval_batch_size: 8
40
  - seed: 42
41
+ - gradient_accumulation_steps: 16
42
  - total_train_batch_size: 64
43
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
  - lr_scheduler_type: linear
45
  - lr_scheduler_warmup_steps: 64
46
+ - num_epochs: 1
47
+
 
 
 
 
 
 
 
 
 
 
 
48
  ### Training results
49
 
50
  | Training Loss | Epoch | Step | Validation Loss |
51
  |:-------------:|:-----:|:----:|:---------------:|
52
+ | 1.4626 | 0.16 | 128 | 1.4613 |
53
+ | 1.4027 | 0.33 | 256 | 1.4235 |
54
+ | 1.4002 | 0.49 | 384 | 1.4054 |
55
+ | 1.3857 | 0.66 | 512 | 1.3951 |
56
+ | 1.3798 | 0.82 | 640 | 1.3870 |
57
+ | 1.3629 | 0.99 | 768 | 1.3839 |
 
 
 
 
 
 
58
 
59
 
60
  ### Framework versions
61
 
62
+ - Transformers 4.34.0
63
  - Pytorch 2.0.1+cu118
64
+ - Datasets 2.14.5
65
+ - Tokenizers 0.14.1