-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 2 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 16 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2403.11901
-
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer
Paper • 2403.10301 • Published • 51 -
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper • 2403.09919 • Published • 20 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 66 -
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 31
-
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 31 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 103 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 58
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 66 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 69
-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 63 -
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Paper • 2402.09025 • Published • 6 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 13 -
Algorithmic progress in language models
Paper • 2403.05812 • Published • 18
-
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 11 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 590 -
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper • 2402.16840 • Published • 23 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 110
-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 110 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 69 -
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 31 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 49