Edit model card

ADL_HW2_MT5

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.5085
  • Rouge1: 13.6234
  • Rouge2: 4.8107
  • Rougel: 13.4828
  • Rougelsum: 13.4694

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.6e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 8

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
6.652 1.0 340 3.9727 10.0329 3.8308 9.954 9.9802
4.5708 2.0 680 3.7827 11.1164 4.1328 11.0046 11.0159
4.3069 3.0 1020 3.6472 12.405 4.4789 12.2766 12.308
4.1563 4.0 1360 3.5830 12.6726 4.5588 12.5504 12.5738
4.0715 5.0 1700 3.5509 12.6934 4.7831 12.5682 12.5705
4.0094 6.0 2040 3.5241 13.3107 4.8201 13.198 13.2002
3.9728 7.0 2380 3.5153 13.3888 4.7922 13.2839 13.2947
3.9505 8.0 2720 3.5085 13.6234 4.8107 13.4828 13.4694

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for b09501048/ADL_HW2_MT5

Base model

google/mt5-small
Finetuned
(274)
this model