File size: 2,405 Bytes

92dd5ce
 
d9df172
 
 
 
 
 
 
 
 
 
 
 
92dd5ce
 
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
 
92dd5ce
d9df172
 
d2f09ed
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92dd5ce
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
 
 
 
 
92dd5ce
d9df172
92dd5ce
d9df172
92dd5ce
d9df172
 
 
 
 
92dd5ce
d9df172
92dd5ce

---
library_name: transformers
language:
  - yue
license: cc-by-4.0
tags:
  - generated_from_trainer
pipeline_tag: fill-mask
widget:
  - text: 香港原本[MASK]一個人煙稀少嘅漁港。
    example_title: 係
model-index:
  - name: bert-large-cantonese
    results: []
---

# bert-large-cantonese

## Description

This model is tranied from scratch on Cantonese text. It is a BERT model with a large architecture (24-layer, 1024-hidden, 16-heads, 326M parameters).

The first training stage is to pre-train the model on 128 length sequences with a batch size of 512 for 1 epoch. the second stage is to continued pre-train the model on 512 length sequences with a batch size of 512 for one more epoch.

## How to use

You can use this model directly with a pipeline for masked language modeling:

```python
from transformers import pipeline

mask_filler = pipeline(
    "fill-mask",
    model="hon9kon9ize/bert-large-cantonese"
)

mask_filler("雞蛋六隻，糖呢就兩茶匙，仲有[MASK]橙皮添。")

; [{'score': 0.08160534501075745,
;   'token': 943,
;   'token_str': '個',
;   'sequence': '雞 蛋 六 隻 ， 糖 呢 就 兩 茶 匙 ， 仲 有 個 橙 皮 添 。'},
;  {'score': 0.06182105466723442,
;   'token': 1576,
;   'token_str': '啲',
;   'sequence': '雞 蛋 六 隻 ， 糖 呢 就 兩 茶 匙 ， 仲 有 啲 橙 皮 添 。'},
;  {'score': 0.04600336775183678,
;   'token': 1646,
;   'token_str': '嘅',
;   'sequence': '雞 蛋 六 隻 ， 糖 呢 就 兩 茶 匙 ， 仲 有 嘅 橙 皮 添 。'},
;  {'score': 0.03743772581219673,
;   'token': 3581,
;   'token_str': '橙',
;   'sequence': '雞 蛋 六 隻 ， 糖 呢 就 兩 茶 匙 ， 仲 有 橙 橙 皮 添 。'},
;  {'score': 0.031560592353343964,
;   'token': 5148,
;   'token_str': '紅',
;   'sequence': '雞 蛋 六 隻 ， 糖 呢 就 兩 茶 匙 ， 仲 有 紅 橙 皮 添 。'}]
```

## Training hyperparameters

The following hyperparameters were used during first training:

- Batch size: 512
- Learning rate: 1e-4
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1

Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/v3ljlpmp)

The following hyperparameters were used during second training:

- Batch size: 512
- Learning rate: 5e-5
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1

Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/vcm3q1ef)