underspirit
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -48,7 +48,7 @@ The models sizes, architectures and learning rate of **XVERSE-MoE-A36B** are sho
|
|
48 |
为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括MMLU、C-Eval、CMMLU、RACE-M、PIQA、GSM8K、MATH、MBPP和HumanEval,这些评估数据集覆盖了模型在多个领域的能力。并与相近参数规模的开源MoE模型进行了对比,结果如下:
|
49 |
|
50 |
**对比开源 Base 模型 - MoE**
|
51 |
-
| | XVERSE-MoE-A36B | Grok-1-A85B | DeepSeek-V2-
|
52 |
| :----------: | :-------------: | :---------: | :--------------: | :--------------: | :----------------: | :-------: |
|
53 |
| Total Params | 255B | 314B | 236B | 146B | 141B | 132B |
|
54 |
| MMLU | **80.8** | 73 | 78.5 | 77.4 | 77.8 | 73.7 |
|
@@ -96,7 +96,7 @@ The models sizes, architectures and learning rate of **XVERSE-MoE-A36B** are sho
|
|
96 |
To comprehensively assess the performance of the model, we conducted extensive testing across a range of standard datasets, including MMLU, C-Eval, CMMLU, RACE-M, PIQA, GSM8K, Math, MBPP and HumanEval. And compared it with open-source MoE models of similar parameter scale, the results are as follows:
|
97 |
|
98 |
**Comparison of Open-Weight Base Models - MoE**
|
99 |
-
| | XVERSE-MoE-A36B | Grok-1-A85B | DeepSeek-V2-
|
100 |
| :----------: | :-------------: | :---------: | :--------------: | :--------------: | :----------------: | :-------: |
|
101 |
| Total Params | 255B | 314B | 236B | 146B | 141B | 132B |
|
102 |
| MMLU | **80.8** | 73 | 78.5 | 77.4 | 77.8 | 73.7 |
|
|
|
48 |
为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括MMLU、C-Eval、CMMLU、RACE-M、PIQA、GSM8K、MATH、MBPP和HumanEval,这些评估数据集覆盖了模型在多个领域的能力。并与相近参数规模的开源MoE模型进行了对比,结果如下:
|
49 |
|
50 |
**对比开源 Base 模型 - MoE**
|
51 |
+
| | XVERSE-MoE-A36B | Grok-1-A85B | DeepSeek-V2-A21B | Skywork-MoE-A22B | Mixtral-8x22B-A39B | DBRX-A36B |
|
52 |
| :----------: | :-------------: | :---------: | :--------------: | :--------------: | :----------------: | :-------: |
|
53 |
| Total Params | 255B | 314B | 236B | 146B | 141B | 132B |
|
54 |
| MMLU | **80.8** | 73 | 78.5 | 77.4 | 77.8 | 73.7 |
|
|
|
96 |
To comprehensively assess the performance of the model, we conducted extensive testing across a range of standard datasets, including MMLU, C-Eval, CMMLU, RACE-M, PIQA, GSM8K, Math, MBPP and HumanEval. And compared it with open-source MoE models of similar parameter scale, the results are as follows:
|
97 |
|
98 |
**Comparison of Open-Weight Base Models - MoE**
|
99 |
+
| | XVERSE-MoE-A36B | Grok-1-A85B | DeepSeek-V2-A21B | Skywork-MoE-A22B | Mixtral-8x22B-A39B | DBRX-A36B |
|
100 |
| :----------: | :-------------: | :---------: | :--------------: | :--------------: | :----------------: | :-------: |
|
101 |
| Total Params | 255B | 314B | 236B | 146B | 141B | 132B |
|
102 |
| MMLU | **80.8** | 73 | 78.5 | 77.4 | 77.8 | 73.7 |
|