Anthropic/hh-rlhf
Viewer
•
Updated
•
169k
•
8.49k
•
1.2k
A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO.
Note The OG of open preference data. Not super great quality
Note NC
Note A simple idea to just generate GPT-4 responses and treat them as preferred response.
Note Some parts are NC
Note Computes the mean preference score instead of relying on the overall score from GPT-4. Also removes contamination from TruthfulQA prompts
Note The dataset behind NVIDIA's SteerLM alignment method