MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published 16 days ago • 12 • 2
Data Selection via Optimal Control for Language Models Paper • 2410.07064 • Published 29 days ago • 8 • 2