metadata
license: cc-by-4.0
language:
- en
- zh
metrics:
- accuracy
A TOP Finetuned Model by xDAN-AI
🤖 #TOP1 on MT-bench scoring 8.45, outperforming GPT3.5 turbo & 70B models !🤖
🤖 #TOP2 on C-Eval scoring 79.45, outperforming GPT3.5 turbo & 70B models !🤖
Exceptional Performance in Key Areas:
MT-Bench Leadboard TOP2
C-Eval Leadboard TOP2
Order | Model | Creator | Submission Date | Avg | Avg(Hard) | STEM | Social Science | Humanities | Others |
---|---|---|---|---|---|---|---|---|---|
1 | Yi-34B | 零一万物 | 2023/11/2 | 81.4 | 58.7 | 73.7 | 89.6 | 84.6 | 84.9 |
2 | xDAN-L2-Chat | xDAN-AI 新旦智能 | 2023/11/10 | 79.27 | 69.07 | 69.07 | 87.64 | 85.99 | 80.21 |
3 | BlueLM-7B | vivo | 2023/11/7 | 73.3 | 48.9 | 64.3 | 83.3 | 76.5 | 77.1 |
4 | Qwen-14B | Alibaba Cloud | 2023/9/22 | 72.1 | 53.7 | 65.7 | 85.4 | 75.3 | 68.4 |
5 | Yi-6B | 零一万物 | 2023/11/2 | 72 | 46.6 | 62.3 | 83.9 | 76.3 | 74.6 |
6 | XuanYuan-70B | 度小满AI-Lab | 2023/9/21 | 71.9 | 53.6 | 67.7 | 83.3 | 73.9 | 67.4 |
7 | ChatGLM3-6B-base | Tsinghua & Zhipu.AI | 2023/10/26 | 69 | 46.8 | 61 | 82.4 | 73.4 | 66.9 |
8 | GPT-4* | OpenAI | 2023/5/15 | 68.7 | 54.9 | 67.1 | 77.6 | 64.5 | 67.8 |
9 | XVERSE-65B | XVERSE Technology | 2023/11/5 | 68.6 | 46.2 | 61.3 | 81.4 | 71 | 67.8 |
10 | Nanbeige-16B-Base | Nanbeige LLM Lab | 2023/11/8 | 63.8 | 43.5 | 57.8 | 77.2 | 66.9 | 59.4 |
11 | LingoWhale-8B | 深言科技(DeepLangAI) | 2023/11/3 | 63.6 | 46.4 | 57 | 73.7 | 68.5 | 61.5 |
12 | Qwen-7B v1.1 | Alibaba Cloud | 2023/9/12 | 63.5 | 46.4 | 57.7 | 78.1 | 66.6 | 57.8 |
13 | ChatGPT* | OpenAI | 2023/5/15 | 54.4 | 41.4 | 52.9 | 61.8 | 50.9 | 53.6 |
14 | Claude-v1.3* | Anthropic | 2023/5/15 | 54.2 | 39 | 51.9 | 61.7 | 52.1 | 53.7 |
15 | Baichuan-13B | Baichuan | 2023/7/9 | 53.6 | 36.7 | 47 | 66.8 | 57.3 | 49.8 |