"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published 3 days ago • 42
Inference Scaling for Long-Context Retrieval Augmented Generation Paper • 2410.04343 • Published Oct 6 • 9