Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2203.15556

Scaling Laws 📏

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Paper • 2206.10789 • Published Jun 22, 2022 • 4
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Paper • 2401.00448 • Published Dec 31, 2023 • 28
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 6

Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 6
Scaling Laws for Autoregressive Generative Modeling

Paper • 2010.14701 • Published Oct 28, 2020
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
A Survey on Data Selection for Language Models

Paper • 2402.16827 • Published Feb 26 • 4

LLM Technical Report

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 125
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 69

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 34
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 40
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 38

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1 • 24
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 11
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8

Papers - Chinchilla

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 3
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 2
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 8
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Paper • 2304.01373 • Published Apr 3, 2023 • 8

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs