AI Governance and Accountability: An Analysis of Anthropic's Claude Paper • 2407.01557 • Published May 2 • 1
FRACTURED-SORRY-Bench: Framework for Revealing Attacks in Conversational Turns Undermining Refusal Efficacy and Defenses over SORRY-Bench Paper • 2408.16163 • Published 27 days ago • 1