-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 85 -
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 34 -
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Paper • 2405.15319 • Published • 25 -
Can LLMs Learn by Teaching? A Preliminary Study
Paper • 2406.14629 • Published • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2402.08609
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60 -
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Paper • 2407.00617 • Published • 7
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 62 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 39 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 29
-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 23 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 48 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 42
-
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning
Paper • 2402.06102 • Published • 4 -
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper • 2402.08609 • Published • 34 -
In deep reinforcement learning, a pruned network is a good network
Paper • 2402.12479 • Published • 17 -
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 47
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 406k • 2.67k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 29
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 45 -
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Paper • 2310.04378 • Published • 19 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118