Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published 17 days ago • 15