Norm Tweaking: High-performance Low-bit Quantization of Large Language Models Paper • 2309.02784 • Published Sep 6, 2023 • 1
Extreme Compression of Large Language Models via Additive Quantization Paper • 2401.06118 • Published Jan 11 • 12
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6 • 48
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models Paper • 2402.14866 • Published Feb 21
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs Paper • 2403.02775 • Published Mar 5 • 11
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization Paper • 2403.07134 • Published Mar 11
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models Paper • 2306.02272 • Published Jun 4, 2023
QuantEase: Optimization-based Quantization for Language Models Paper • 2309.01885 • Published Sep 5, 2023 • 4
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 68
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points Paper • 2404.12759 • Published Apr 19
Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs Paper • 2406.01721 • Published Jun 3
Attention-aware Post-training Quantization without Backpropagation Paper • 2406.13474 • Published Jun 19 • 1