LLM

QwQ-32B-Preview

Running on CPU Upgrade

12.1k

🏆

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 104

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 158

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 2

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 27

LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13

GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 5

PaLM 2 Technical Report

Paper • 2305.10403 • Published May 17, 2023 • 6

Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 126

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 110

meta-llama/Llama-Guard-3-8B

Text Generation • Updated Oct 11 • 150k • 145

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 14

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17 • 58

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 61

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 136

allenai/Molmo-7B-D-0924

Image-Text-to-Text • Updated Oct 10 • 235k • 480

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 24 days ago • 119

OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated 11 days ago • 4.64k • 158

Running

375

⚡

InternVL

Qwen/Qwen2.5-0.5B-Instruct

Text Generation • Updated Sep 25 • 394k • 160

Running

585

🚀

Qwen2.5

Qwen/Qwen2.5-0.5B

Text Generation • Updated Sep 25 • 293k • 150

google/gemma-1.1-7b-it

Text Generation • Updated Jun 27 • 13.6k • • 267

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • Updated Sep 27 • 1.56M • • 3.72k

meta-llama/Meta-Llama-3-8B

Text Generation • Updated Sep 27 • 530k • 5.93k

meta-llama/Meta-Llama-3-70B-Instruct

Text Generation • Updated 14 days ago • 84.5k • 1.45k

meta-llama/Meta-Llama-3-70B

Text Generation • Updated Sep 27 • 26.8k • 841

meta-llama/Meta-Llama-Guard-2-8B

Text Generation • Updated May 13 • 13.7k • 287

meta-llama/Llama-3.2-11B-Vision

Image-Text-to-Text • Updated Sep 27 • 56.9k • 416

meta-llama/Llama-3.2-1B-Instruct

Text Generation • Updated Oct 24 • 1.62M • • 669

Running

1.22k

🐢

Qwen2.5 Coder Artifacts

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 135

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 104

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 75

meta-llama/Llama-3.1-70B

Text Generation • Updated Sep 25 • 109k • 328

meta-llama/Llama-3.1-8B-Instruct

Text Generation • Updated Sep 25 • 4.49M • • 3.34k

microsoft/Phi-3.5-vision-instruct

Image-Text-to-Text • Updated Sep 26 • 273k • 626

microsoft/Phi-3.5-MoE-instruct

Text Generation • Updated Oct 24 • 41.6k • 544

meta-llama/Llama-3.2-3B

Text Generation • Updated Oct 24 • 1.2M • 423

meta-llama/Llama-3.2-1B

Text Generation • Updated Oct 24 • 2.15M • • 1.35k

microsoft/Phi-3.5-mini-instruct

Text Generation • Updated Sep 18 • 507k • • 723

Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • Updated 23 days ago • 2.36M • • 986

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 74

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 7

liuhaotian/llava-v1.5-7b

Image-Text-to-Text • Updated May 8 • 1.23M • 393

Qwen/Qwen2-VL-2B-Instruct

Image-Text-to-Text • Updated 23 days ago • 1.2M • 340

openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • Updated Sep 25 • 28.1k • 1.38k

Running

537

🖼💬

Vision Arena (Testing VLMs side-by-side)

microsoft/Florence-2-large

Image-Text-to-Text • Updated 20 days ago • 379k • 1.31k

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 86

nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Oct 18 • 8.84k • 759

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 72

rhymes-ai/Aria

Image-Text-to-Text • Updated 11 days ago • 17.5k • 601

mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated 2 days ago • 553

HuggingFaceM4/idefics2-8b

Image-Text-to-Text • Updated Oct 14 • 19.2k • 599

liuhaotian/llava-v1.5-13b

Image-Text-to-Text • Updated May 9 • 105k • 490

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 104

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 102

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 149

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 87

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 72

ShareGPT4Video/ShareGPT4Video

Viewer • Updated Jul 8 • 40.2k • 3.01k • 185

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 57

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 59

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Paper • 2406.19741 • Published Jun 28 • 59

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 68

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22 • 89

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 138

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11 • 84

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7 • 111

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 113

Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 116

OpenGVLab/InternVL2_5-1B

Image-Text-to-Text • Updated 11 days ago • 7.31k • 37

Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • Updated Nov 20 • 417k • 284

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 55

Running on Zero

235

🔥

Qwen2-VL-7B

llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • Updated Nov 22 • 293k • 244

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37

openbmb/MiniCPM-V-2

Visual Question Answering • Updated Aug 6 • 4.95k • 433

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18 • 16

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Paper • 2308.12038 • Published Aug 23, 2023 • 2

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 79

mistralai/Mixtral-8x7B-Instruct-v0.1

Text Generation • Updated Aug 19 • 3.6M • • 4.24k

mistralai/Mistral-7B-v0.1

Text Generation • Updated Jul 24 • 3.62M • • 3.51k

microsoft/phi-2

Text Generation • Updated Apr 29 • 179k • 3.26k

mistralai/Mistral-7B-Instruct-v0.2

Text Generation • Updated Sep 27 • 4.06M • • 2.6k

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47

nvidia/Mistral-NeMo-Minitron-8B-Base

Text Generation • Updated Aug 22 • 12.1k • 165

Anychat

Qwen2.5 Coder Artifacts

QwQ-32B-Preview

Open LLM Leaderboard

InternVL

Qwen2.5

Qwen2.5 Coder Artifacts

Vision Arena (Testing VLMs side-by-side)

Qwen2-VL-7B