Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28
The Generative AI Paradox: "What It Can Create, It May Not Understand" Paper • 2311.00059 • Published Oct 31, 2023 • 18
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Paper • 2310.20624 • Published Oct 31, 2023 • 12