LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published 3 days ago • 25
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published Jun 26, 2024 • 41