Spaces:

Auto-Arena
/

Leaderboard

Running

App Files Files Community

Leaderboard / src /results /auto-arena-llms-results-20241007.csv

Ruochen Zhao

updated leaderboard

bcdb701 about 1 month ago

history blame contribute delete

1.95 kB

	Model,Rank,MT-Bench Hard,MT-Bench,LC-AlpacaEval,openLLM,MMLU,From,Open?,Params(B),Cost,Score
	[claude-3-5-sonnet-20240620](https://www.anthropic.com/news/claude-3-5-sonnet),1,,,57.5,,87.2,Anthropic,No,-,15,1181.774515
	[gpt-4o-2024-05-13](https://openai.com/index/hello-gpt-4o/),2,,,57.5,,87.2,OpenAI,No,-,15,1130.486708
	[GPT-4-turbo-0409](https://platform.openai.com/docs/models/gpt-4o),3,82.6,,55,,86.5,OpenAI,No,-,30,1097.746895
	[command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus),4,33.1,,,,75.7,Cohere,Yes,104B,15,1042.974783
	[meta-llama/Llama-3-70b-chat-hf](https://ai.meta.com/blog/meta-llama-3/),5,41.1,,34.4,36.18,80.06,meta,Yes,70B,-,1033.389278
	[gemini-1.5-flash-exp-0827](https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-flash),6,,,,,77.9,Google,No,-,0.3,1028.810421
	[claude-3-haiku-20240307](https://www.anthropic.com/api),7,41.5,9.1,,,75.2,Anthropic,No,-,1.25,1021.014418
	[qwen2-72B-instruct](https://qwenlm.github.io/blog/qwen2/),8,48.1,9.12,,42.49,84.2,Alibaba,Yes,72B,-,1017.866425
	[google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it),9,57.51,,,32.31,,Google,Yes,27B,-,1015.595207
	[Qwen1.5-72B-chat](https://huggingface.co/Qwen/Qwen1.5-72B),10,36.1,8.61,36.6,,77.2,Alibaba,Yes,72B,-,1011.735171
	[zero-one-ai/Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat),11,23.1,7,27.2,23.9,74.87,Zero One AI,Yes,34B,-,949.9850023
	[Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1),12,23.4,8.3,23.7,24.35,71.4,Mistral AI,Yes,7B,-,939.4805565
	[GPT-3.5-Turbo-0125](https://openai.com/index/new-embedding-models-and-api-updates/),13,23.3,7.94,17.7,,70,OpenAI,No,-,1.5,889.8482309
	[deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat),14,,,17.8,26.87,71.3,Deepseek AI,Yes,67B,-,846.4850997
	[Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf),15,11.6,6.86,14.7,,63.91,Meta,Yes,70B,-,792.8072912