Zhexin Zhang's picture

2 2 1

Zhexin Zhang

nonstopfor

·

AI & ML interests

None yet

Recent Activity

updated a dataset about 8 hours ago

thu-coai/AISafetyLab_Datasets

upvoted a paper 3 days ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

commented a paper 3 days ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

View all activity

Organizations

nonstopfor's activity

updated a dataset about 8 hours ago

thu-coai/AISafetyLab_Datasets

Viewer • Updated about 8 hours ago • 1.07k

upvoted a paper 3 days ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published 8 days ago • 8

commented a paper 3 days ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published 8 days ago • 8 •

liked a model about 2 months ago

thu-coai/ShieldLM-7B-internlm2

Feature Extraction • Updated Feb 27 • 733 • 9

upvoted a paper 6 months ago

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3 • 10

commented a paper 6 months ago

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3 • 10 •

authored 3 papers 6 months ago

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

Paper • 2309.07045 • Published Sep 13, 2023

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Paper • 2311.09096 • Published Nov 15, 2023

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3 • 10

updated 2 models 6 months ago

thu-coai/vicuna-7b-v1.5-safeunlearning

Text Generation • Updated Jul 8 • 83

thu-coai/Mistral-7B-Instruct-v0.2-safeunlearning

Text Generation • Updated Jul 8 • 66

updated 4 models 10 months ago

thu-coai/ShieldLM-14B-qwen

Text Generation • Updated Feb 27 • 321 • 13

thu-coai/ShieldLM-13B-baichuan2

Text Generation • Updated Feb 27 • 53 • 3

thu-coai/ShieldLM-7B-internlm2

Feature Extraction • Updated Feb 27 • 733 • 9

thu-coai/ShieldLM-6B-chatglm3

Feature Extraction • Updated Feb 27 • 162 • 3

updated a dataset over 1 year ago

thu-coai/SafetyBench

Viewer • Updated Sep 14, 2023 • 25k • 253 • 17