--- library_name: peft license: cc-by-nc-4.0 datasets: - BramVanroy/alpaca-cleaned-dutch language: - nl pipeline_tag: text-generation tags: - llama - alpaca --- # open_llama_7b_alpaca_clean_dutch_qlora ## Model description This adapter model is a fine-tuned version of [openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) on the [BramVanroy/alpaca-cleaned-dutch](https://www.huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) dataset. See [openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) for all information about the base model. ## Intended uses & limitations The open_llama_7b model was primarily trained on the English language. Part of the dataset was a Wikipedia dump containing pages in 20 languages. Dutch was one of those languages. Given the size of the total dataset and the wikipedia part the Dutch language was very likely less than 0.5% of the total data. The primary intention of this model is to explore the use of the Dutch language in combination with an Open LLM. ## Training and evaluation data This model was trained on the [BramVanroy/alpaca-cleaned-dutch](https://www.huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch) dataset. Commercial use is forbidden. This model is intended for research only. ## Training procedure This model was finetuned with a QLoRA setup on a Google Colab A100 GPU in about 6.5 hours. The notebook used for training can be found here: [Training Notebook](https://github.com/RobinSmits/Dutch-LLMs/blob/main/Open_Llama_7B_Alpaca_Clean_Dutch_Qlora.ipynb) ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 64 - training_steps: 1536 The following `bitsandbytes` quantization config was used during training: - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 1.1240 | 1.0 | 768 | 1.1227 | | 1.0177 | 2.0 | 1536 | 1.0645 | ### Framework versions - Transformers 4.30.2 - Pytorch 2.0.1+cu118 - Datasets 2.13.1 - Tokenizers 0.13.3 - PEFT 0.4.0.dev0