-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 143 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 73 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2405.08707
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 31 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52
-
Latxa: An Open Language Model and Evaluation Suite for Basque
Paper • 2403.20266 • Published • 3 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 116 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 27
-
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 48 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64
-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 40 -
Interfacing Foundation Models' Embeddings
Paper • 2312.07532 • Published • 10 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17 -
TheBloke/quantum-v0.01-GPTQ
Text Generation • Updated • 18 • 2