Leaderboard / src /results /auto-arena-llms-results-20240624.csv
Ruochen Zhao
updated leaderboard
bcdb701
Model,Rank,MT-Bench Hard,MT-Bench,LC-AlpacaEval,openLLM,MMLU,From,Open?,Params(B),Cost,Score
[claude-3-5-sonnet-20240620](https://www.anthropic.com/news/claude-3-5-sonnet),1,,,57.5,,87.2,Anthropic,No,-,15,1282.192081
[gpt-4o-2024-05-13](https://openai.com/index/hello-gpt-4o/),2,,,57.5,,87.2,OpenAI,No,-,15,1194.520424
[GPT-4-turbo-0409](https://platform.openai.com/docs/models/gpt-4o),3,82.6,,55,86.27,86.5,OpenAI,No,-,30,1124.732733
[qwen2-72B-instruct](https://qwenlm.github.io/blog/qwen2/),4,48.1,9.12,,,84.2,Alibaba,Yes,72B,-,1109.810932
[meta-llama/Llama-3-70b-chat-hf](https://ai.meta.com/blog/meta-llama-3/),5,41.1,,34.4,77.88,80.06,meta,Yes,70B,-,1048.258949
[glm-4](https://open.bigmodel.cn/trialcenter?modelCode=glm-4),6,,,,,81.5,Zhipu AI,No,-,13.8,1038.939252
[minimax-abab6.5-chat](https://platform.minimaxi.com/),7,,,,,78.7,minimax,No,-,4.2,1037.480905
[command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus),8,33.1,,,74.62,75.7,Cohere,Yes,104B,15,1023.464325
[claude-3-haiku-20240307](https://www.anthropic.com/api),9,41.5,9.1,,84.8,75.2,Anthropic,No,-,1.25,1009.099768
[Qwen1.5-72B-chat](https://huggingface.co/Qwen/Qwen1.5-72B),10,36.1,8.61,36.6,72.91,77.2,Alibaba,Yes,72B,-,994.660656
[reka-core-20240501](https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model),11,,,,,83.2,Reka AI,No,-,25,994.535244
[SenseChat-5](https://console.sensecore.cn/nova/home),12,,,,,84.7,SenseTime,No,-,13.8,993.937723
[Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1),13,23.4,8.3,23.7,72.71,71.4,Mistral AI,Yes,7B,-,935.679463
[wenxin-4](https://yiyan.baidu.com/),14,,,,,,Baidu,No,-,16.6,927.68737
[zero-one-ai/Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat),15,23.1,7,27.2,63.17,74.87,Zero One AI,Yes,34B,-,917.300671
[mistral-large-2402](https://mistral.ai/news/mistral-large/),16,37.7,8.63,32.7,,81.2,Mistral AI,No,-,12,900.837414
[GPT-3.5-Turbo-0125](https://openai.com/index/new-embedding-models-and-api-updates/),17,23.3,7.94,17.7,71.02,70,OpenAI,No,-,1.5,863.193661
[deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat),18,,,17.8,,71.3,Deepseek AI,Yes,67B,-,814.974318
[Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf),19,11.6,6.86,14.7,62.4,63.91,Meta,Yes,70B,-,788.694112