p208p2002
/

llama-3-zhtw-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-3-zhtw-8B / README.md

p208p2002's picture

Update README.md

9f0f0e4 verified 6 months ago

|

history blame contribute delete

2 kB

	---
	datasets:
	- HuggingFaceFW/fineweb
	- erhwenkuo/c4-chinese-zhtw
	- erhwenkuo/wikipedia-zhtw
	- p208p2002/wudao
	- p208p2002/NDLTD-T10-90-111
	- codeparrot/github-code-clean
	language:
	- en
	- zh
	license: llama3
	---
	# Llama 3 zhtw

	在 Llama 3 上試驗中文 Continue Pretraining (CP)，共計訓練 800M tokens。

	由於中文預訓練語料品質還有改進空間，CP 後表現未能超越原版 Llama 3，我們比較幾個開源社群訓練的中文 Llama 3 也有類似狀況。

	在英文方面 LLaMA 3 zhtw 使用 FineWeb，使得 MMLU 表現高於其他中文CP模型，能力與原版 LLaMA 3 持平。

	## Benchmarks
	\| Models \| \| ↑ TMMLU+ (ACC) \| CMMLU (ACC) \| MMLU (ACC) \|
	\| ---------------------------- \| --- \| -------------- \| ------------- \| ------------- \|
	\| \| \| TC, Knowledge \| CN, Knowledge \| EN, Knowledge \|
	\| \| \| 5 shot \| 5 shot \| 5 shot \|
	\| Yi-6B \| 6B \| 49.63 \| 75.53 \| 65.35 \|
	\| Qwen-7B \| 7B \| 42.84 \| 73.1 \| 61.00 \|
	\| Meta-Llama-3-8B \| 8B \| 41.97 \| 50.8 \| 65.17 \|
	\| p208p2002/llama-3-zhtw-8B \| 8B \| 41.84 \| 50.6 \| 65.31 \|
	\| Breeze-7B-Base-v0_1 \| 7B \| 40.35 \| 44.05 \| 61.63 \|
	\| hfl/llama-3-chinese-8b \| 8B \| 39.64 \| 50.9 \| 61.1 \|

	## Recipe

	### Datasets
	\| Dataset \| Lang \| Weight \|
	\|----------------\|-------------\|--------\|
	\| FineWeb \| en \| 0.35 \|
	\| Wudao \| zh-cn \| 0.1 \|
	\| C4Tw \| zh-tw \| 0.1 \|
	\| WikiZhTw \| zh-tw \| 0.15 \|
	\| NdltdT10 \| zh-tw \| 0.1 \|
	\| GitHubMarkDown \| code \| 0.1 \|
	\| GitHubPython \| code \| 0.1 \|

	### Hyper Parameters

	- Learning Rate: 1e-7
	- Global Batch Size: 60
	- Sequence Length: 8192