LM Inference - a kaizuberbuehler Collection

kaizuberbuehler 's Collections

Image Generation

Vision Language Models

Foundation Models

Synthetic Data and Self-Improvement

Agents

Video Generation

LM Prompt Engineering

LM Capabilities and Scaling

Music Generation

LM Architectures

Code Generation

Speech Synthesis

EXL2 Quantized Models

LM Inference

updated Oct 27

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 604
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 105
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 43
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 104
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8 • 31
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8 • 20
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 112
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 44
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 74
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 117
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Paper • 2404.18911 • Published Apr 29 • 29
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published May 20 • 25
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published Jun 11 • 37
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11 • 24
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31 • 3
ThinK: Thinner Key Cache by Query-Driven Pruning

Paper • 2407.21018 • Published Jul 30 • 30
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B

Paper • 2409.11055 • Published Sep 17 • 16
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published Sep 25 • 24
Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14 • 17
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25 • 27