File size: 1,171 Bytes
a6f5d3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 4x1.8B MoE Qwen Ckpt 18000

This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.

This model is a checkpoint model for the continue pretraining stage.

![](loss_plot.png)

# Evaluations

|  Groups   | Metric |Value |   |Stderr|
|-----------|--------|-----:|---|-----:|
|boolq      |acc     |0.6502|±  |0.0083|
|ceval-valid|acc     |0.5171|±  |0.1872|
|           |acc_norm|0.5171|±  |0.1872|
|cmmlu      |acc     |0.5041|±  |0.1222|
|           |acc_norm|0.5041|±  |0.1222|
|mathqa     |acc     |0.2693|±  |0.0081|
|           |acc_norm|0.2693|±  |0.0081|

# Acknowledgements

+ [Qwen](https://github.com/QwenLM/Qwen)
+ [mistral.ai](https://mistral.ai)

# License Agreement

This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].

During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.