Awesome feedback datasets

HuggingFaceH4 's Collections

Zephyr ORPO

Zephyr 7B

Zephyr 7B Gemma

StarChat2 15B

Journal Club

Papers We've Read

Awesome SFT datasets

Awesome feedback datasets

Awesome reward models

updated Apr 12

A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO.

Upvote

Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 8.49k • 1.2k

Note The OG of open preference data. Not super great quality
berkeley-nest/Nectar

Viewer • Updated Mar 20 • 183k • 585 • 276

Note NC
openbmb/UltraFeedback

Viewer • Updated Dec 29, 2023 • 64k • 1.7k • 333
Intel/orca_dpo_pairs

Viewer • Updated Nov 29, 2023 • 12.9k • 1.15k • 288

Note A simple idea to just generate GPT-4 responses and treat them as preferred response.
Hello-SimpleAI/HC3

Viewer • Updated Jan 21, 2023 • 48.6k • 1.2k • 182

Note Some parts are NC
lvwerra/stack-exchange-paired

Viewer • Updated Mar 13, 2023 • 31.3M • 3.33k • 140
argilla/ultrafeedback-binarized-preferences-cleaned

Viewer • Updated Dec 11, 2023 • 60.9k • 7k • 124

Note Computes the mean preference score instead of relying on the overall score from GPT-4. Also removes contamination from TruthfulQA prompts
nvidia/HelpSteer

Viewer • Updated Jun 24 • 37.1k • 2.63k • 217

Note The dataset behind NVIDIA's SteerLM alignment method
jondurbin/truthy-dpo-v0.1

Viewer • Updated Jan 11 • 1.02k • 950 • 128
lmsys/chatbot_arena_conversations

Viewer • Updated Sep 30, 2023 • 33k • 567 • 337
hbXNov/sparse_feedback

Updated Aug 31, 2023 • 42 • 5
Unified-Language-Model-Alignment/Anthropic_HH_Golden

Viewer • Updated Oct 4, 2023 • 44.8k • 1.08k • 29
neovalle/H4rmony

Viewer • Updated Apr 30 • 2.02k • 108 • 15
PKU-Alignment/PKU-SafeRLHF

Viewer • Updated 23 days ago • 164k • 3.55k • 114
peiyi9979/Math-Shepherd

Viewer • Updated Jan 3 • 445k • 1.09k • 66
m-a-p/Code-Feedback

Viewer • Updated Feb 26 • 66.4k • 271 • 197
introspector/unimath

Updated Feb 12 • 5.98k • 6
abacusai/MetaMath_DPO_FewShot

Viewer • Updated Feb 26 • 395k • 120 • 25
interstellarninja/tool-calls-dpo

Viewer • Updated Jan 23 • 235 • 46 • 6

Upvote