|
--- |
|
library_name: transformers |
|
license: mit |
|
--- |
|
This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)! |
|
|
|
``` |
|
@inproceedings{ |
|
xu2024contrastive, |
|
title={Contrastive Preference Optimization: Pushing the Boundaries of {LLM} Performance in Machine Translation}, |
|
author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim}, |
|
booktitle={Forty-first International Conference on Machine Learning}, |
|
year={2024}, |
|
url={https://openreview.net/forum?id=51iwkioZpn} |
|
} |
|
``` |
|
|
|
Here are released models for CPO and SimPO. The code is based on SimPO github. We focus on highlighting reference-free preference learning and demonstrating the effectiveness of SimPO. |
|
Additionally, we integrate length normalization and target reward margin into CPO, showing promising results and the potential benefits of combining them together. |
|
CPO adds a BC-regularizer to prevent the model from deviating too much from the preferred data distribution. |
|
|
|
|
|
|
|
| models | | AE2 LC | AE2 WR | |
|
|------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:| |
|
| Llama3 Instruct 8B SimPO (reported) | [princeton-nlp/Llama-3-Instruct-8B-SimPO](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO) | 44.7 | 40.5 | |
|
| Llama3 Instruct 8B SimPO (reproduced) | [haoranxu/Llama-3-Instruct-8B-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-SimPO) | 43.3 | 40.6 | |
|
| Llama3 Instruct 8B CPO | [haoranxu/Llama-3-Instruct-8B-CPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO) | 36.07 | 40.06 | |
|
| Llama3 Instruct 8B CPO-SimPO | [haoranxu/Llama-3-Instruct-8B-CPO-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO-SimPO) | 46.94 | 44.72 | |
|
|
|
|