metadata

library_name: transformers
tags:
  - llama 3
  - 'orca '
  - 'dpo '
datasets:
  - Intel/orca_dpo_pairs
pipeline_tag: text-generation
license: other
license_name: llama-3
license_link: https://llama.meta.com/llama3/license

Orca-Llama-3-8B-Instruct-DPO

Finetuned Llama 3 8B Instruct on Intel/orca_dpo_pairs using a single 3090 24GB. Data formated using the ChatML template.

GGUF can be found here RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF

ORPOConfig:

    learning_rate=1e-6,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=True,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=35,
    report_to="wandb",
    output_dir="./results/",
    fp16=True,
    save_steps=50