Edit model card
YAML Metadata Error: "base_model" with value "./reduced_model" is not valid. Use a model id from https://hf.co/models.

tst-translation-output

This model is a fine-tuned version of mbart-large-cc25 on an custom dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7663
  • Bleu: 19.3382
  • Gen Len: 17.8929

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
2.6161 11.09 2000 3.1762 13.5109 19.1966
2.6161 13.86 2500 3.0375 16.2868 18.7985
1.4467 16.62 3000 3.1328 17.6991 18.1949
1.4467 19.39 3500 3.2690 17.9052 18.3117
0.6809 22.15 4000 3.3850 18.4075 18.2149
0.6809 24.91 4500 3.4465 19.0339 18.009
0.3422 27.68 5000 3.5680 18.7281 17.5902
0.3422 30.44 5500 3.6350 19.1534 18.2177
0.1941 33.2 6000 3.7153 19.2575 17.8784
0.1941 35.97 6500 3.7382 19.2475 17.9831
0.1271 38.73 7000 3.7573 19.3045 17.9889

Framework versions

  • Transformers 4.33.1
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.13.3
Downloads last month
0
Safetensors
Model size
611M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.