MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Paper • 2410.02743 • Published Oct 3 • 6
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 8 days ago • 78
NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published 6 days ago • 6
NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published 6 days ago • 6 • 2
Reliable, Reproducible, and Really Fast Leaderboards with Evalica Paper • 2412.11314 • Published 12 days ago • 2
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25 • 40
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper • 2411.16594 • Published Nov 25 • 36
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22 • 56
Response Tuning: Aligning Large Language Models without Instruction Paper • 2410.02465 • Published Oct 3 • 12
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7 • 13
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7 • 12
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7 • 12 • 3
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7 • 12 • 3
Collaborative Performance Prediction for Large Language Models Paper • 2407.01300 • Published Jul 1 • 2
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7 • 12
Collaborative Performance Prediction for Large Language Models Paper • 2407.01300 • Published Jul 1 • 2