File size: 1,387 Bytes

e6674fd
 
42a8f74
 
9a23889
 
42a8f74
 
 
9a23889
ae4029e
 
e6674fd
b0dc11f
e6674fd
549da90
436f197
c6e6130
 
42a8f74
 
 
 
 
 
 
 
 
 
 
 
 
b0dc11f
42a8f74
 
 
b0dc11f
42a8f74
 
b0dc11f
 
42a8f74
 
2701254
b0dc11f
 
9a23889

---
library_name: transformers
tags:
- llama 3
- 'orca '
- 'dpo '
datasets:
- Intel/orca_dpo_pairs
pipeline_tag: text-generation
license: other
license_name: llama-3
license_link: https://llama.meta.com/llama3/license
---
# Orca-Llama-3-8B-Instruct-DPO

Finetuned [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) using a single 3090 24GB. Data formated using the ChatML template.

GGUF can be found here [RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF](https://huggingface.co/RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF)

ORPOConfig:

```
    learning_rate=1e-6,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=True,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=35,
    report_to="wandb",
    output_dir="./results/",
    fp16=True,
    save_steps=50
```

<div style="text-align: center;">
  <img src="https://i.imgur.com/vQ4RzSl.png" style="width: 100%; margin: 0 auto; display: inline-block;"/>
  <img src="https://i.imgur.com/9H75ijW.png" style="width: 100%; margin: 0 auto; display: inline-block;"/>
</div>