The Big Benchmarks Collection

open-llm-leaderboard 's Collections

Open LLM Leaderboard 2

Open LLM Leaderboard best models ❤️‍🔥

updated 23 days ago

Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard)

Upvote

152

Running on CPU Upgrade

11.7k

🏆

Open LLM Leaderboard 2

Track, rank and evaluate open LLMs and chatbots

Note 📐 The 🤗 Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots. 🤗 Submit a model for automated evaluation on the 🤗 GPU cluster on the “Submit” page!
Running on CPU Upgrade

4.13k

🥇

MTEB Leaderboard

Note Massive Text Embedding Benchmark (MTEB) Leaderboard.
Running

3.68k

🏆🤖

Chatbot Arena Leaderboard

Note 🏆 This leaderboard is based on the following three benchmarks: Chatbot Arena - a crowdsourced, randomized battle platform. We use 70K+ user votes to compute Elo ratings. MT-Bench - a set of challenging multi-turn questions. We use GPT-4 to grade the model responses. MMLU (5-shot) - a test to measure a model’s multitask accuracy on 57 tasks.
Running

378

🏆🏋️

LLM-Perf Leaderboard

Note The 🤗 LLM-Perf Leaderboard 🏋️ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using Optimum-Benchmark and Optimum flavors. Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
Running

975

📈

Big Code Models Leaderboard

Note Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. We also measure throughput and provide information about the models. We only compare open pre-trained multilingual code models, that people can start from as base models for their trainings.
Running on CPU Upgrade

505

🏆

Open ASR Leaderboard

Note The 🤗 Open ASR Leaderboard ranks and evaluates speech recognition models on the Hugging Face Hub. We report the Average WER (⬇️) and RTF (⬇️) - lower the better. Models are ranked based on their Average WER, from lowest to highest
Running

177

📊

MT Bench

Note The MT-Bench Browser (see Chatbot arena)
Running

60

⚡

Toolbench Leaderboard
Running

84

🚀

OpenCompass LLM Leaderboard
Runtime error

18

🚀

MMBench Leaderboard
Running on CPU Upgrade

475

📉

Open Ko-LLM Leaderboard
Running

18

🏆

Subquadratic LLM Leaderboard

Upvote

152

Open LLM Leaderboard 2

MTEB Leaderboard

Chatbot Arena Leaderboard

LLM-Perf Leaderboard

Big Code Models Leaderboard

Open ASR Leaderboard

MT Bench

Toolbench Leaderboard

OpenCompass LLM Leaderboard

MMBench Leaderboard

Open Ko-LLM Leaderboard

Subquadratic LLM Leaderboard