--- library_name: transformers tags: - trl - dpo --- # Model Card for Model ID ## Model Details Finetune Llama-3-8B model with Orca-DPO dataset. ## Training Details ### Training Data Trained on Orca dataset (DPO). ### Training Procedure Add NEFTune module for robustness, and fine-tune the model with DPO trainer. #### Training Hyperparameters - lora_alpha = 16 - lora_r = 64 - lora_dropout = 0.1 - adam_beta1 = 0.9 - adam_beta2 = 0.999 - weight_decay = 0.001 - max_grad_norm = 0.3 - learning_rate = 2e-4 - bnb_4bit_quant_type = nf4 - optim = "paged_adamw_32bit" - optimizer_type = "paged_adamw_32bit" - max_steps = 5000 - gradient_accumulation_steps = 4