chestnutlzj's picture
Upload 9 files
84c7ea9
|
raw
history blame
1.58 kB
# 4x1.8B MoE Qwen Ckpt 50000
This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.
This model is a checkpoint model for the continue pretraining stage.
![](loss_plot.png)
# Evaluations
| Groups |n-shot| Metric |Value | |Stderr|
|------------------|-----:|--------|-----:|---|-----:|
|boolq | 0|acc |0.6508|± |0.0083|
|ceval-valid | 0|acc |0.5290|± |0.1912|
| | 0|acc_norm|0.5290|± |0.1912|
|cmmlu | 0|acc |0.5087|± |0.1237|
| | 0|acc_norm|0.5087|± |0.1237|
|mathqa | 0|acc |0.2647|± |0.0081|
| | 0|acc_norm|0.2693|± |0.0081|
|mmlu | 0|acc |0.4353|± |0.0830|
| - stem | 0|acc |0.3809|± |0.0659|
| - social_sciences| 0|acc |0.4959|± |0.0708|
| - other | 0|acc |0.4844|± |0.0744|
| - humanities | 0|acc |0.3998|± |0.0849|
# Acknowledgements
+ [Qwen](https://github.com/QwenLM/Qwen)
+ [mistral.ai](https://mistral.ai)
# License Agreement
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.