"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published 23 days ago • 45 • 3
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Paper • 2203.07259 • Published Mar 14, 2022 • 3
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6 • 7
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published 23 days ago • 45
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published 23 days ago • 45
Compressed LLMs from the Community Collection LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI! • 3 items • Updated Sep 26 • 2
Compressed LLMs from the Community Collection LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI! • 3 items • Updated Sep 26 • 2
Compressed LLMs from the Community Collection LLMs optimized by the community using Neural Magic's LLM Compressor for efficient deployment in vLLM. Contribute and help advance efficient AI! • 3 items • Updated Sep 26 • 2
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17 • 59