Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published 28 days ago • 8 • 2
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction Paper • 2411.06424 • Published about 1 month ago • 5 • 2
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering Paper • 2408.07888 • Published Aug 15 • 11 • 2