Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published 7 days ago • 14
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 20
Towards Scalable Automated Alignment of LLMs: A Survey Paper • 2406.01252 • Published Jun 3, 2024 • 2