Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Abstract
Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Learning to Skip for Language Modeling (2023)
- Large Language Model Inference with Lexical Shortlisting (2023)
- Non-Vacuous Generalization Bounds for Large Language Models (2023)
- Splitwise: Efficient generative LLM inference using phase splitting (2023)
- How Does Calibration Data Affect the Post-training Pruning and Quantization of Large Language Models? (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Revolutionizing AI Costs: Beyond Chinchilla-Optimal Scaling for Language Models!
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper