metadata
library_name: transformers
tags:
- trl
- dpo
Model Card for Model ID
Model Details
Finetune Llama-3-8B model with Orca-DPO dataset.
Training Details
Training Data
Trained on Orca dataset (DPO).
Training Procedure
Add NEFTune module for robustness, and fine-tune the model with DPO trainer.
Training Hyperparameters
- lora_alpha = 16
- lora_r = 64
- lora_dropout = 0.1
- adam_beta1 = 0.9
- adam_beta2 = 0.999
- weight_decay = 0.001
- max_grad_norm = 0.3
- learning_rate = 2e-4
- bnb_4bit_quant_type = nf4
- optim = "paged_adamw_32bit"
- optimizer_type = "paged_adamw_32bit"
- max_steps = 5000
- gradient_accumulation_steps = 4