Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.15115

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18 • 19
Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12 • 18
Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14 • 10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Paper • 2411.13476 • Published Nov 20 • 15

📝 Cool LLM papers

Starting from 2024-11-15

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 8 days ago • 328
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published 11 days ago • 15
Running

390

📝

Scaling test-time compute
Reverse Thinking Makes LLMs Stronger Reasoners

Paper • 2411.19865 • Published 28 days ago • 19

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 168
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 23 days ago • 119
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published 22 days ago • 104
o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published 28 days ago • 41

Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 6
Scaling Laws for Autoregressive Generative Modeling

Paper • 2010.14701 • Published Oct 28, 2020
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
A Survey on Data Selection for Language Models

Paper • 2402.16827 • Published Feb 26 • 4

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17 • 51
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 41
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20 • 52

foundation_models

Apple Intelligence Foundation Language Models

Paper • 2407.21075 • Published Jul 29 • 3
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 110
Nemotron-4 340B Technical Report

Paper • 2406.11704 • Published Jun 17
Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 75

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 5

Papers - Encodings - Rotary - RoPE

The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 160
Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8 • 1
ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27 • 1

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 125
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Papers - Qwen - Report

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 35
Qwen2.5 Technical Report

Paper • 2412.15115 • Published 8 days ago • 328

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs