About ORPO

alvarobartt 's Collections

updated Sep 2

Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 62

Note Annotated paper and personal notes coming soon!
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Text Generation • Updated Apr 18 • 482 • 261

Note ORPO full fine-tune of `mistral-community/Mixtral-8x22B-v0.1` with `argilla/distilabel-capybara-dpo-7k-binarized` with ChatML formatting (in collaboration with Hugging Face, Argilla and Kaist AI)
alvarobartt/mistral-orpo-mix

Text Generation • Updated Mar 24 • 9

Note ORPO full fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting
alvarobartt/Mistral-7B-v0.1-ORPO

Text Generation • Updated Mar 23 • 17 • 14

Note ORPO fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting using QLoRA (weights are merged into base model)
alvarobartt/Mistral-7B-v0.1-ORPO-PEFT

Text Generation • Updated Mar 23 • 3 • 1

Note ORPO fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting using QLoRA (only contains the adapter, for merged weights check `alvarobartt/Mistral-7B-v0.1-ORPO`)
alvarobartt/mistral-orpo-mix-b0.05-l1024-pl512-lr5e-7-cosine

Text Generation • Updated Mar 26 • 6
alvarobartt/mistral-orpo-mix-b0.1-l2048-pl1792-lr5e-6-inverse-sqrt

Text Generation • Updated Mar 26 • 7