File size: 3,054 Bytes
967e848
b3ee252
 
 
 
 
 
 
967e848
 
 
5cd8050
967e848
8479f52
17a96e1
967e848
6bfd7c6
 
df90bb2
c680b33
f2d7fd3
967e848
8479f52
17a96e1
967e848
c680b33
 
 
967e848
8479f52
17a96e1
967e848
 
 
 
 
 
 
 
 
 
 
7b6ccdc
 
967e848
 
8479f52
59b46b0
 
 
967e848
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8479f52
17a96e1
967e848
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
language:
- es
- zh

tags:
- translation

license: apache-2.0
---

This model is a fine-tuned version of [Helsinki-NLP/opus-tatoeba-es-zh](https://huggingface.co/Helsinki-NLP/opus-tatoeba-es-zh) on a dataset of legal domain constructed by the author himself.


# Intended uses & limitations

This model is the result of the master graduation thesis for the Tradumatics: Translation Technologies program at the Autonomous University of Barcelona.

Please refer to the GitHub repo created for this thesis for the full-text and relative open-sourced materials: https://github.com/guocheng98/MUTTT2020_TFM_ZGC

The thesis intends to explain various theories and certain algorithm details about neural machine translation, thus this fine-tuned model only serves as a hands-on practice example for that objective, without any intention of productive usage.


# Training and evaluation data

The dataset is constructed from the Chinese translation of Spanish Civil Code, Spanish Constitution, and many other laws & regulations found in the database China Law Info (北大法宝 Beida Fabao), along with their source text found on Boletín Oficial del Estado and EUR-Lex.

There are 9972 sentence pairs constructed. 1000 are used for evaluation and the rest for training.


# Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 10
- mixed_precision_training: Native AMP
- weight_decay: 0.01
- early_stopping_patience: 8


# Training results

Best validation loss achieved at step 5600.

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 2.9584        | 0.36  | 400  | 2.6800          |
| 2.6402        | 0.71  | 800  | 2.5017          |
| 2.5038        | 1.07  | 1200 | 2.3907          |
| 2.3279        | 1.43  | 1600 | 2.2999          |
| 2.2258        | 1.78  | 2000 | 2.2343          |
| 2.1061        | 2.14  | 2400 | 2.1961          |
| 1.9279        | 2.5   | 2800 | 2.1569          |
| 1.9059        | 2.85  | 3200 | 2.1245          |
| 1.7491        | 3.21  | 3600 | 2.1227          |
| 1.6301        | 3.57  | 4000 | 2.1169          |
| 1.6871        | 3.92  | 4400 | 2.0979          |
| 1.5203        | 4.28  | 4800 | 2.1074          |
| 1.4646        | 4.63  | 5200 | 2.1024          |
| 1.4739        | 4.99  | 5600 | 2.0905          |
| 1.338         | 5.35  | 6000 | 2.0946          |
| 1.3152        | 5.7   | 6400 | 2.0974          |
| 1.306         | 6.06  | 6800 | 2.0985          |
| 1.1991        | 6.42  | 7200 | 2.0962          |
| 1.2113        | 6.77  | 7600 | 2.1092          |
| 1.1983        | 7.13  | 8000 | 2.1060          |
| 1.1238        | 7.49  | 8400 | 2.1102          |
| 1.1417        | 7.84  | 8800 | 2.1078          |


# Framework versions

- Transformers 4.7.0
- Pytorch 1.8.1+cu101
- Datasets 1.8.0
- Tokenizers 0.10.3