Spaces:
Running
Running
<html> | |
<head> | |
<meta charset="utf-8"> | |
<meta name="description" | |
content=""> | |
<meta name="keywords" content="Counterfactual DPO"> | |
<meta name="viewport" content="width=device-width, initial-scale=1"> | |
<title>Aligning Large Language Models with Counterfactual DPO | |
</title> | |
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" | |
rel="stylesheet"> | |
<link rel="stylesheet" href="./static/css/bulma.min.css"> | |
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css"> | |
<link rel="stylesheet" href="./static/css/bulma-slider.min.css"> | |
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css"> | |
<link rel="stylesheet" | |
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> | |
<link rel="stylesheet" href="./static/css/index.css"> | |
<link rel="icon" href="./static/images/favicon.svg"> | |
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> | |
<script defer src="./static/js/fontawesome.all.min.js"></script> | |
<script src="./static/js/bulma-carousel.min.js"></script> | |
<script src="./static/js/bulma-slider.min.js"></script> | |
<script src="./static/js/index.js"></script> | |
</head> | |
<body> | |
<section class="hero"> | |
<div class="hero-body"> | |
<div class="container is-max-desktop"> | |
<div class="columns is-centered"> | |
<div class="column has-text-centered"> | |
<h1 class="title is-1 publication-title">Aligning Large Language Models with Counterfactual DPO | |
</h1> | |
<div class="is-size-5 publication-authors"> | |
<span class="author-block"> | |
<a href="" target="_blank">Bradley Butcher</a></span> | |
<div class="column has-text-centered"> | |
<div class="publication-links"> | |
<!-- PDF Link. --> | |
<span class="link-block"> | |
<a href="https://arxiv.org/pdf/2401.09566" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2401.09566" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="ai ai-arxiv"></i> | |
</span> | |
<span>arXiv</span> | |
</a> | |
</span> | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</section> | |
</section> | |
<section class="section"> | |
<div class="container is-max-desktop"> | |
<!-- Abstract. --> | |
<div class="columns is-centered has-text-centered"> | |
<div class="column is-four-fifths"> | |
<h2 class="title is-3">Abstract</h2> | |
<div class="content has-text-justified"> | |
<p> | |
Advancements in large language models (LLMs) have demonstrated remarkable capabilities across a diverse range of applications. | |
These models excel in generating text completions that are contextually coherent and cover an extensive array of subjects. | |
However, the vast datasets required for their training make aligning response styles during the pretraining and instruction tuning phases challenging. | |
Consequently, an additional alignment phase is typically employed, wherein the model is further trained with human preference data to better align its outputs with human expectations. | |
While this process doesn't introduce new capabilities per se, it does accentuate generation styles innate to the model. | |
This paper explores the utilization of counterfactual prompting within the framework of Direct Preference Optimization (DPO) to align the model's style without relying on human intervention. | |
We demonstrate that this method effectively instils desirable behaviour, mitigates undesirable ones, and encourages the model to disregard inappropriate instructions. | |
Our findings suggest that counterfactual prompting with DPO presents a low-resource way to fine-tune LLMs to meet the demands for responsible and ethically aligned AI systems. | |
</p> | |
</div> | |
</div> | |
</div> | |
<!--/ Abstract. --> | |
</body> | |
</html> | |