Text Generation
Safetensors
English
qwen2
text-generation-inference
conversational
NanoLM-365M-Base / README_zh-CN.md
Mxode's picture
Update README_zh-CN.md
43b298b verified

NanoLM-365M-base

English | 简体中文

Introduction

Qwen2-0.5B 的基础上,将 tokenizer 替换为了 BilingualTokenizer-8K,以达到减小参数的目的。总参数从 0.5B 降低到了 365M。

Details

为了恢复一定的性能,便于下游任务微调,替换 tokenizer 后我选择冻结主干参数,仅训练 embedding 部分,在 wikipedia-zhcosmopedia-100k 上训练了 40,000 steps。

Value
Total Params 365 M
Trainable Params < 10 M
Trainable Parts model.embed_tokens
Training Steps 40,000
Training Dataset wikipedia-zh, cosmopedia-100k
Optimizer adamw_torch
Learning Rate 2e-4
LR Scheduler cosine
Weight Decay 0.1
Warm-up Ratio 0.03
Batch Size 16
Gradient Accumulation Steps 1
Seq Len 4096
Dtype bf16
Peak GPU Memory < 48 GB
Device NVIDIA A100-SXM4-80GB

具体训练记录如下: result