--- library_name: peft base_model: mistralai/Mistral-7B-v0.1 license: apache-2.0 datasets: - mlabonne/guanaco-llama2-1k --- # Model Card for Model ID - This model is finetuned version of mistral 7B model (mistralai/Mistral-7B-v0.1). - I have finetuned mistral 7B on using instruction tuning guanacao llama2 1k training dataset (mlabonne/guanaco-llama2-1k). ## Model Details I have used Kaggle's model feature to load the base model and then have followed following steps to fine tune the model: - First I created quantization config to load based model in 4 bit precision to reduce the memory footprint using `BitsAndBytesConfig` and providing in quantization config when loading pretrained model - Thereafter I loaded the model using `AutoModelForCausalLM.from_pretrained` - We also get tokenizer from pretrained base model using `AutoTokenizer.from_pretrained` and adjust it to fp16. - LORA Config - I used PEFT technique QLORA to create Low Rank Adptation Config for adding an adapter layer for fine tuning. - Using LORA we add small rank weight matrices whose parameters are modified while LLM's parameters are frozen. After finetuning is over we combine weights of these low rank matrices with LLMs weights to obtain new fine tuned weights. This makes fine tuning process faster and memory efficient - We train SFT (Supervised Fine-Tuning) trainer using LORA parameters and training hyperparameters listed under *Training Hyperparameters* section to finetune the base model - **Developed by:** Avani Sharma - **Model type:** LLM - **Finetuned from model [optional]:** mistralai/Mistral-7B-v0.1 ### Model Sources [optional] - **Repository:** https://github.com/Avani1994/NLP/blob/99dd33484bdf06261fd872f24b939977b55bdceb/Mistral_7B_4bit_QLoRA_Fine_tuning_Explained.ipynb #### Training Hyperparameters I used following params for LORA params: ``` lora_alpha=16, lora_dropout=0.1, r=64, ``` And following Hyperparameters for training ``` num_train_epochs=1 optim="paged_adamw_32bit", save_steps=25, logging_steps=25, per_device_train_batch_size=4 gradient_accumulation_steps=1 learning_rate=2e-4, weight_decay=0.001, lr_scheduler_type="constant", fp16=False, bf16=False, max_grad_norm=0.3, max_steps=-1, warmup_ratio=0.03, group_by_length=True, report_to="wandb" ``` ### Compute Infrastructure Kaggle #### Hardware Kaggle GPU T4x2 #### Software Kaggle Notebook ### Framework versions - PEFT 0.7.1