Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published 13 days ago • 19
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published 10 days ago • 14
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 10 days ago • 37
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published 8 days ago • 54
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 5 days ago • 41
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published 5 days ago • 28
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1 • 9