BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 15
Transformers Can Achieve Length Generalization But Not Robustly Paper • 2402.09371 • Published Feb 14 • 13
A Thorough Examination of Decoding Methods in the Era of LLMs Paper • 2402.06925 • Published Feb 10 • 1