LLM Reasoning - a Giuliano Collection

Giuliano 's Collections

LLM Personalization

Agents

LLM Reasoning

updated about 9 hours ago

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 10
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published 18 days ago • 62
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21 • 58
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training

Paper • 2309.17179 • Published Sep 29, 2023 • 2
Qwen2.5 Technical Report

Paper • 2412.15115 • Published 8 days ago • 328
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Paper • 2410.13639 • Published Oct 17 • 16
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25 • 40
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3 • 52
Tree of Problems: Improving structured problem solving with compositionality

Paper • 2410.06634 • Published Oct 9 • 8
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published 10 days ago • 89
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31 • 12
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6 • 51
Running

814

🔍

QwQ-32B-Preview

QwQ-32B-Preview
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published 7 days ago • 33
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Paper • 2411.07279 • Published Nov 11 • 3
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Paper • 2410.18451 • Published Oct 24 • 15
Skywork/Skywork-Reward-Gemma-2-27B-v0.2

Text Classification • Updated Oct 25 • 5.41k • 22
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27 • 13
Understanding Hidden Computations in Chain-of-Thought Reasoning

Paper • 2412.04537 • Published 22 days ago
Generative Reward Models

Paper • 2410.12832 • Published Oct 2 • 6
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published 4 days ago • 37
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Paper • 2410.02089 • Published Oct 2 • 12
V-STaR: Training Verifiers for Self-Taught Reasoners

Paper • 2402.06457 • Published Feb 9 • 9