Cantonese-Written Chinese Translation Model

This model is a fine-tuned version of fnlp/bart-base-chinese on Cantonese-Written Chinese Dataset Gen2. It achieves the following results on the evaluation set:

Model description

The model is based on BART Chinese model, trained on 1M Cantonese-Written Chinese Parallel Corpus data.

Its intended use is to translate Cantonese sentences to Written Chinese accurately.

Training and evaluation data is provided by the Cantonese-Written Chinese Dataset Gen2.

The training was performed using Seq2SeqTrainer.

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Gen Len
0.2275	0.05	5000	1.5256	40.6521	42.475	13.2277
0.1752	0.1	10000	1.5413	40.7808	42.5628	13.2556
0.1533	0.15	15000	1.5938	40.7698	42.5348	13.2678
0.1442	0.2	20000	1.6487	40.6062	42.353	13.2602
0.1317	0.24	25000	1.7148	40.569	42.2753	13.2798