-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 60 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2402.17193
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 23 -
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 18 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 24 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 45
-
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Paper • 2402.19481 • Published • 20 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 23 -
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 42
-
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18 -
Priority Sampling of Large Language Models for Compilers
Paper • 2402.18734 • Published • 16
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 23 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 9
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18