mbart_cycle1_ko-en / README.md
yesj1234's picture
Update README.md
8b8ffce
|
raw
history blame
2.26 kB
metadata
language:
  - ko
  - en
base_model: ./reduced_model
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: tst-translation-output
    results: []

tst-translation-output

This model is a fine-tuned version of mbart-large-cc25 on an custom dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7663
  • Bleu: 19.3382
  • Gen Len: 17.8929

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
2.6161 11.09 2000 3.1762 13.5109 19.1966
2.6161 13.86 2500 3.0375 16.2868 18.7985
1.4467 16.62 3000 3.1328 17.6991 18.1949
1.4467 19.39 3500 3.2690 17.9052 18.3117
0.6809 22.15 4000 3.3850 18.4075 18.2149
0.6809 24.91 4500 3.4465 19.0339 18.009
0.3422 27.68 5000 3.5680 18.7281 17.5902
0.3422 30.44 5500 3.6350 19.1534 18.2177
0.1941 33.2 6000 3.7153 19.2575 17.8784
0.1941 35.97 6500 3.7382 19.2475 17.9831
0.1271 38.73 7000 3.7573 19.3045 17.9889

Framework versions

  • Transformers 4.33.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.13.3