BlackBeenie's picture
Update README.md
86c51cc
metadata
library_name: transformers
tags:
  - trl
  - dpo

Model Card for Model ID

Model Details

Finetune Llama-3-8B model with Orca-DPO dataset.

Training Details

Training Data

Trained on Orca dataset (DPO).

Training Procedure

Add NEFTune module for robustness, and fine-tune the model with DPO trainer.

Training Hyperparameters

  • lora_alpha = 16
  • lora_r = 64
  • lora_dropout = 0.1
  • adam_beta1 = 0.9
  • adam_beta2 = 0.999
  • weight_decay = 0.001
  • max_grad_norm = 0.3
  • learning_rate = 2e-4
  • bnb_4bit_quant_type = nf4
  • optim = "paged_adamw_32bit"
  • optimizer_type = "paged_adamw_32bit"
  • max_steps = 5000
  • gradient_accumulation_steps = 4