Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data Paper • 2404.03862 • Published Apr 5
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11 • 12
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11 • 12
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11 • 12 • 2
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published Oct 1 • 34
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published Oct 1 • 34
jackzhang/BeaverTails-dedupprompt_model-gpt-4o_harmful_cat_judge_clustercat_cot-improved Viewer • Updated Jul 29 • 34.2k • 53
jackzhang/BeaverTails-dedupprompt_model-gpt-4-32k_harmful_cat_clustercat_cot-improved Viewer • Updated Jul 29 • 34.2k • 37
jackzhang/BeaverTails-dedupprompt_model-gpt-4o_harmful_cat_judge_clustercat Viewer • Updated Jul 29 • 34.2k • 36 • 2