metadata

library_name: transformers
license: apache-2.0
base_model: t5-small
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: cnn_news_summary_model_trained_on_reduced_data
    results: []
datasets:
  - abisee/cnn_dailymail

cnn_news_summary_model_trained_on_reduced_data

This model is a fine-tuned version of t5-small on an cnn_dailymail dataset. It achieves the following results on the evaluation set:

Loss: 1.6597
Rouge_1: 0.2162
Rouge_2: 0.0943
Rouge_l: 0.1834
Rouge_lsum: 0.1834
Generated_Length: 19.0

Model description

Base Model: t5-small, which is a smaller version of the T5 (Text-to-Text Transfer Transformer) model developed by Google.

This model can be particularly useful if you need to quickly summarize large volumes of text, making it easier to digest and understand key information.

Intended uses & limitations

Intended Use
- The model is designed for text summarization, which involves condensing long pieces of text into shorter, more digestible summaries. Here are some specific use cases:
- News Summarization: Quickly summarizing news articles to provide readers with the main points.
- Document Summarization: Condensing lengthy reports or research papers into brief overviews.
- Content Curation: Helping content creators and curators to generate summaries for newsletters, blogs, or social media posts.
- Educational Tools: Assisting students and educators by summarizing academic texts and articles.
Limitations
- While the model is powerful, it does have some limitations:
- Accuracy: The summaries generated might not always capture all the key points accurately, especially for complex or nuanced texts.
- Bias: The model can inherit biases present in the training data, which might affect the quality and neutrality of the summaries.
- Context Understanding: It might struggle with understanding the full context of very long documents, leading to incomplete or misleading summaries.
- Language and Style: The model’s output might not always match the desired tone or style, requiring further editing.
- Data Dependency: Performance can vary depending on the quality and nature of the input data. It performs best on data similar to its training set (news articles)

Training and evaluation data

The model was trained using the Adam optimizer with a learning rate of 2e-05 over 2 epochs.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Generated Length
No log	1.0	288	1.6727	0.217	0.0949	0.1841	0.1839	19.0
1.9118	2.0	576	1.6597	0.2162	0.0943	0.1834	0.1834	19.0

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.19.1