chestnutlzj
/

MoE-Qwen-4x1.8B-pretrain-50000-ckpt

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

MoE-Qwen-4x1.8B-pretrain-50000-ckpt / README.md

chestnutlzj's picture

Upload 9 files

84c7ea9 10 months ago

|

1.58 kB

	# 4x1.8B MoE Qwen Ckpt 50000

	This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.

	This model is a checkpoint model for the continue pretraining stage.

	![](loss_plot.png)

	# Evaluations

	\| Groups \|n-shot\| Metric \|Value \| \|Stderr\|
	\|------------------\|-----:\|--------\|-----:\|---\|-----:\|
	\|boolq \| 0\|acc \|0.6508\|± \|0.0083\|
	\|ceval-valid \| 0\|acc \|0.5290\|± \|0.1912\|
	\| \| 0\|acc_norm\|0.5290\|± \|0.1912\|
	\|cmmlu \| 0\|acc \|0.5087\|± \|0.1237\|
	\| \| 0\|acc_norm\|0.5087\|± \|0.1237\|
	\|mathqa \| 0\|acc \|0.2647\|± \|0.0081\|
	\| \| 0\|acc_norm\|0.2693\|± \|0.0081\|
	\|mmlu \| 0\|acc \|0.4353\|± \|0.0830\|
	\| - stem \| 0\|acc \|0.3809\|± \|0.0659\|
	\| - social_sciences\| 0\|acc \|0.4959\|± \|0.0708\|
	\| - other \| 0\|acc \|0.4844\|± \|0.0744\|
	\| - humanities \| 0\|acc \|0.3998\|± \|0.0849\|

	# Acknowledgements

	+ [Qwen](https://github.com/QwenLM/Qwen)
	+ [mistral.ai](https://mistral.ai)

	# License Agreement

	This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].

	During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.