Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization Paper • 2406.11431 • Published Jun 17 • 4 • 2