Datasets: NeurIPS LLM Challenge 2023 Datasets that were under consideration for usage in my submission to the 2023 NeurIPS Large Language Model Efficiency Challenge. mosaicml/instruct-v3 Viewer • Updated Oct 2, 2023 • 63k • 426 • 32 databricks/databricks-dolly-15k Viewer • Updated Jun 30, 2023 • 15k • 13.3k • 759 hendrycks/competition_math Updated Jun 8, 2023 • 24.2k • 133 kaist-ai/CoT-Collection Viewer • Updated Oct 14, 2023 • 1.84M • 1.18k • 120
Papers Detecting Pretraining Data from Large Language Models Paper • 2310.16789 • Published Oct 25, 2023 • 10 Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 18 AutoMix: Automatically Mixing Language Models Paper • 2310.12963 • Published Oct 19, 2023 • 14 An Emulator for Fine-Tuning Large Language Models using Small Language Models Paper • 2310.12962 • Published Oct 19, 2023 • 14
Detecting Pretraining Data from Large Language Models Paper • 2310.16789 • Published Oct 25, 2023 • 10
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 18
An Emulator for Fine-Tuning Large Language Models using Small Language Models Paper • 2310.12962 • Published Oct 19, 2023 • 14