BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published 12 days ago • 17
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published 10 days ago • 55
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages Paper • 2411.16508 • Published 7 days ago • 7
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published 7 days ago • 34
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper • 2411.16594 • Published 7 days ago • 34
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published 10 days ago • 19
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages Paper • 2411.12240 • Published 14 days ago • 6
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published 14 days ago • 47
Loss-to-Loss Prediction: Scaling Laws for All Datasets Paper • 2411.12925 • Published 13 days ago • 5
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs Paper • 2411.14199 • Published 11 days ago • 25
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 12 days ago • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published 11 days ago • 19
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published 11 days ago • 53
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 17 days ago • 61
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published 17 days ago • 42
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 17 days ago • 107
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Paper • 2411.07140 • Published 21 days ago • 33