bert-large-cantonese
Description
This model is tranied from scratch on Cantonese text. It is a BERT model with a large architecture (24-layer, 1024-hidden, 16-heads, 326M parameters).
The first training stage is to pre-train the model on 128 length sequences with a batch size of 512 for 1 epoch. the second stage is to continued pre-train the model on 512 length sequences with a batch size of 512 for one more epoch.
How to use
You can use this model directly with a pipeline for masked language modeling:
from transformers import pipeline
mask_filler = pipeline(
"fill-mask",
model="hon9kon9ize/bert-large-cantonese"
)
mask_filler("雞蛋六隻,糖呢就兩茶匙,仲有[MASK]橙皮添。")
; [{'score': 0.08160534501075745,
; 'token': 943,
; 'token_str': '個',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 個 橙 皮 添 。'},
; {'score': 0.06182105466723442,
; 'token': 1576,
; 'token_str': '啲',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 啲 橙 皮 添 。'},
; {'score': 0.04600336775183678,
; 'token': 1646,
; 'token_str': '嘅',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 嘅 橙 皮 添 。'},
; {'score': 0.03743772581219673,
; 'token': 3581,
; 'token_str': '橙',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 橙 橙 皮 添 。'},
; {'score': 0.031560592353343964,
; 'token': 5148,
; 'token_str': '紅',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 紅 橙 皮 添 。'}]
Training hyperparameters
The following hyperparameters were used during first training:
- Batch size: 512
- Learning rate: 1e-4
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1
Loss plot on WanDB
The following hyperparameters were used during second training:
- Batch size: 512
- Learning rate: 5e-5
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1
Loss plot on WanDB
- Downloads last month
- 78
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.