GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Paper • 2410.05229 • Published Oct 7 • 17
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 242
What Algorithms can Transformers Learn? A Study in Length Generalization Paper • 2310.16028 • Published Oct 24, 2023 • 2
A Tale of Tails: Model Collapse as a Change of Scaling Laws Paper • 2402.07043 • Published Feb 10 • 13
Linear attention is (maybe) all you need (to understand transformer optimization) Paper • 2310.01082 • Published Oct 2, 2023 • 1
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 73
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 258