Jack Zhang's picture

3 2

Jack Zhang

jackzhang

·

http://jackz.io/

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

authored a paper about 1 month ago

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

upvoted a paper about 1 month ago

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

View all activity

Organizations

Papers 3

arxiv:2410.08968

arxiv:2410.01044

arxiv:2404.03862

models 1

jackzhang/llama3.1-8b-instruct-SFT-V5-bt_wg-addr_imp-DPO_131676

Updated Sep 27 • 105

datasets 12

jackzhang/V5-bt-wg-addr_imp-train

Viewer • Updated Sep 16 • 122k • 35

jackzhang/V4-bt_gpt-4o_wg-train

Viewer • Updated Sep 15 • 133k • 31

jackzhang/bt_7cat_test_400_unseencat

Viewer • Updated Sep 5 • 1.2k • 34

jackzhang/bt_7cat_5spec_testset_400

Viewer • Updated Sep 5 • 2k • 35

jackzhang/V2-given_sys-ah-train-no_em

Viewer • Updated Aug 6 • 61.1k • 35

jackzhang/bt_multi_4-V1-given_sys_combine-test

Viewer • Updated Aug 5 • 3.45k • 31

jackzhang/BeaverTails-dedupprompt_model-gpt-4o_harmful_cat_judge_clustercat_cot-improved

Viewer • Updated Jul 29 • 34.2k • 51

jackzhang/BeaverTails-dedupprompt_model-gpt-4-32k_harmful_cat_clustercat_cot-improved

Viewer • Updated Jul 29 • 34.2k • 36

jackzhang/BeaverTails-dedupprompt_model-gpt-4o_harmful_cat_judge_clustercat

Viewer • Updated Jul 29 • 34.2k • 35 • 2

jackzhang/train-llama3_safegen-bt_helpgen-mixed_mode

Viewer • Updated Jul 23 • 30.8k • 42