|
# 4x1.8B MoE Qwen Ckpt 50000 |
|
|
|
This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods. |
|
|
|
This model is a checkpoint model for the continue pretraining stage. |
|
|
|
![](loss_plot.png) |
|
|
|
# Evaluations |
|
|
|
| Groups |n-shot| Metric |Value | |Stderr| |
|
|------------------|-----:|--------|-----:|---|-----:| |
|
|boolq | 0|acc |0.6508|± |0.0083| |
|
|ceval-valid | 0|acc |0.5290|± |0.1912| |
|
| | 0|acc_norm|0.5290|± |0.1912| |
|
|cmmlu | 0|acc |0.5087|± |0.1237| |
|
| | 0|acc_norm|0.5087|± |0.1237| |
|
|mathqa | 0|acc |0.2647|± |0.0081| |
|
| | 0|acc_norm|0.2693|± |0.0081| |
|
|mmlu | 0|acc |0.4353|± |0.0830| |
|
| - stem | 0|acc |0.3809|± |0.0659| |
|
| - social_sciences| 0|acc |0.4959|± |0.0708| |
|
| - other | 0|acc |0.4844|± |0.0744| |
|
| - humanities | 0|acc |0.3998|± |0.0849| |
|
|
|
# Acknowledgements |
|
|
|
+ [Qwen](https://github.com/QwenLM/Qwen) |
|
+ [mistral.ai](https://mistral.ai) |
|
|
|
# License Agreement |
|
|
|
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT]. |
|
|
|
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement. |
|
|