argilla/ultrafeedback-binarized-preferences
Viewer
•
Updated
•
63.6k
•
234
•
69
This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs
Note Binarized version of `OpenBMB/UltraFeedback` using the average of the preference ratings of the attributes: helpfulness, honesty, truthfulness, and instruction following
Note Iteration on top of `argilla/ultrafeedback-binarized-preferences` to remove the prompts contaminated from TruthfulQA within the original `OpenBMB/UltraFeedback` dataset
Note Iteration on top of `argilla/ultrafeedback-binarized-preferences-cleaned` to keep every rejected response per each one chosen, instead of picking a random one, leading to an augmented dataset to be used for DPO fine-tuning experiments.
Note Ranking dataset used by Starling