-
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Paper • 2405.03594 • Published • 7 -
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Paper • 2203.07259 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2301.00774
-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 62 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 68
-
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
LLM-Pruner: On the Structural Pruning of Large Language Models
Paper • 2305.11627 • Published • 3 -
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 4
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 11 -
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Paper • 2305.15805 • Published • 1 -
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Paper • 2305.11186 • Published • 1 -
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper • 2110.07560 • Published • 1
-
Sparse Finetuning for Inference Acceleration of Large Language Models
Paper • 2310.06927 • Published • 14 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Paper • 2203.07259 • Published • 3 -
How Well Do Sparse Imagenet Models Transfer?
Paper • 2111.13445 • Published • 1