Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6 • 7
Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot Paper • 2301.00774 • Published Jan 2, 2023 • 3
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Paper • 2203.07259 • Published Mar 14, 2022 • 3