arxiv:2410.08968
Jack Zhang
jackzhang
AI & ML interests
None yet
Recent Activity
authored
a paper
about 1 month ago
Verifiable by Design: Aligning Language Models to Quote from
Pre-Training Data
authored
a paper
about 1 month ago
Controllable Safety Alignment: Inference-Time Adaptation to Diverse
Safety Requirements
upvoted
a
paper
about 1 month ago
Controllable Safety Alignment: Inference-Time Adaptation to Diverse
Safety Requirements
Organizations
Papers
3
datasets
12
jackzhang/V5-bt-wg-addr_imp-train
Viewer
•
Updated
•
122k
•
35
jackzhang/V4-bt_gpt-4o_wg-train
Viewer
•
Updated
•
133k
•
31
jackzhang/bt_7cat_test_400_unseencat
Viewer
•
Updated
•
1.2k
•
34
jackzhang/bt_7cat_5spec_testset_400
Viewer
•
Updated
•
2k
•
35
jackzhang/V2-given_sys-ah-train-no_em
Viewer
•
Updated
•
61.1k
•
35
jackzhang/bt_multi_4-V1-given_sys_combine-test
Viewer
•
Updated
•
3.45k
•
31
jackzhang/BeaverTails-dedupprompt_model-gpt-4o_harmful_cat_judge_clustercat_cot-improved
Viewer
•
Updated
•
34.2k
•
51
jackzhang/BeaverTails-dedupprompt_model-gpt-4-32k_harmful_cat_clustercat_cot-improved
Viewer
•
Updated
•
34.2k
•
36
jackzhang/BeaverTails-dedupprompt_model-gpt-4o_harmful_cat_judge_clustercat
Viewer
•
Updated
•
34.2k
•
35
•
2
jackzhang/train-llama3_safegen-bt_helpgen-mixed_mode
Viewer
•
Updated
•
30.8k
•
42