MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17 • 74
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10 • 28
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published Oct 9 • 15