Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 5 days ago • 41
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference Paper • 2403.09636 • Published Mar 14 • 2
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 49
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study Paper • 2401.17981 • Published Jan 31 • 1
What Algorithms can Transformers Learn? A Study in Length Generalization Paper • 2310.16028 • Published Oct 24, 2023 • 2
Empower Your Model with Longer and Better Context Comprehension Paper • 2307.13365 • Published Jul 25, 2023 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information Paper • 2203.16634 • Published Mar 30, 2022 • 5
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level Paper • 2403.04690 • Published Mar 7 • 1
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation Paper • 2310.05737 • Published Oct 9, 2023 • 4
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 25
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 75
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Paper • 2403.05612 • Published Mar 8 • 3
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 112
Zoology: Measuring and Improving Recall in Efficient Language Models Paper • 2312.04927 • Published Dec 8, 2023 • 2
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Paper • 2311.12351 • Published Nov 21, 2023 • 3
Sequence Parallelism: Long Sequence Training from System Perspective Paper • 2105.13120 • Published May 26, 2021 • 5