File size: 1,171 Bytes
a6f5d3d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# 4x1.8B MoE Qwen Ckpt 18000
This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.
This model is a checkpoint model for the continue pretraining stage.
![](loss_plot.png)
# Evaluations
| Groups | Metric |Value | |Stderr|
|-----------|--------|-----:|---|-----:|
|boolq |acc |0.6502|± |0.0083|
|ceval-valid|acc |0.5171|± |0.1872|
| |acc_norm|0.5171|± |0.1872|
|cmmlu |acc |0.5041|± |0.1222|
| |acc_norm|0.5041|± |0.1222|
|mathqa |acc |0.2693|± |0.0081|
| |acc_norm|0.2693|± |0.0081|
# Acknowledgements
+ [Qwen](https://github.com/QwenLM/Qwen)
+ [mistral.ai](https://mistral.ai)
# License Agreement
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.
|