ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k Collection Several trained models to compare the differences between each method. Each model has a complete description of hyperparams with wandb reports. • 11 items • Updated Jul 2