Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 18 days ago • 62
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6 • 31
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 7 days ago • 33
MALT: Improving Reasoning with Multi-Agent LLM Training Paper • 2412.01928 • Published 25 days ago • 39
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 4 days ago • 36
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15 • 68
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 135
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3 • 52
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7 • 46
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 4 days ago • 37