Papers
arxiv:2501.04686

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Published on Jan 8
· Submitted by Lin1557 on Jan 9
Authors:
,
,
,
,

Abstract

Chain-of-thought (CoT) reasoning has been widely applied in the mathematical reasoning of Large Language Models (LLMs). Recently, the introduction of derivative process supervision on CoT trajectories has sparked discussions on enhancing scaling capabilities during test time, thereby boosting the potential of these models. However, in multimodal mathematical reasoning, the scarcity of high-quality CoT training data has hindered existing models from achieving high-precision CoT reasoning and has limited the realization of reasoning potential during test time. In this work, we propose a three-module synthesis strategy that integrates CoT distillation, trajectory-format rewriting, and format unification. It results in a high-quality CoT reasoning instruction fine-tuning dataset in multimodal mathematics, MMathCoT-1M. We comprehensively validate the state-of-the-art (SOTA) performance of the trained URSA-7B model on multiple multimodal mathematical benchmarks. For test-time scaling, we introduce a data synthesis strategy that automatically generates process annotation datasets, known as DualMath-1.1M, focusing on both interpretation and logic. By further training URSA-7B on DualMath-1.1M, we transition from CoT reasoning capabilities to robust supervision abilities. The trained URSA-RM-7B acts as a verifier, effectively enhancing the performance of URSA-7B at test time. URSA-RM-7B also demonstrates excellent out-of-distribution (OOD) verifying capabilities, showcasing its generalization. Model weights, training data and code will be open-sourced.

Community

Paper submitter

TL;DR: Our work focuses on multimodal mathematical reasoning capabilities. We contribute the MMathCoT-1M and DualMath-1.1M datasets through a three-module high-quality CoT data synthesis and dual-view process label automation. The URSA-7B model, fine-tuned on MMathCoT-1M, achieves SOTA performance among models of the same size on multiple multimodal mathematics benchmarks. Furthermore, the URSA-RM-7B, trained on DualMath-1.1M, is the first contribution of a small-sized reward model in the domain of multimodal mathematics, with its test-time scaling effectiveness being greatly validated.

Paper author

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.04686 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.04686 in a Space README.md to link it from this page.

Collections including this paper 8