ashaduzzaman
/

t5-small-finetuned-billsum

Summarization

TensorBoard

Safetensors

English

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

ashaduzzaman commited on Aug 26

Commit

4a969cc

•

1 Parent(s): 31e32c8

Update README.md

Browse files

Files changed (1) hide show

README.md +90 -36

README.md CHANGED Viewed

@@ -9,60 +9,114 @@ metrics:
 model-index:
 - name: t5-small-finetuned-billsum
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # t5-small-finetuned-billsum
-This model is a fine-tuned version of [google-t5/t5-small](https://huggingface.co/google-t5/t5-small) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 2.5533
-- Rouge1: 0.1356
-- Rouge2: 0.0495
-- Rougel: 0.1144
-- Rougelsum: 0.1144
-- Gen Len: 19.0
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- num_epochs: 3
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
-|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|
-| No log        | 1.0   | 62   | 2.6711          | 0.1308 | 0.0445 | 0.1107 | 0.1109    | 19.0    |
-| No log        | 2.0   | 124  | 2.5761          | 0.1338 | 0.0483 | 0.1137 | 0.1137    | 19.0    |
-| No log        | 3.0   | 186  | 2.5533          | 0.1356 | 0.0495 | 0.1144 | 0.1144    | 19.0    |
-### Framework versions
-- Transformers 4.42.4
-- Pytorch 2.3.1+cu121
-- Datasets 2.21.0
-- Tokenizers 0.19.1

 model-index:
 - name: t5-small-finetuned-billsum
   results: []
+datasets:
+- Helsinki-NLP/opus_books
+language:
+- en
+pipeline_tag: summarization
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # t5-small-finetuned-billsum
+This model is a fine-tuned version of [google/t5-small](https://huggingface.co/google/t5-small) on a custom dataset related to legislative bill summarization. It is optimized for generating concise summaries of legislative bills and other similar documents.
+## Model Details
+- **Model Name:** t5-small-finetuned-billsum
+- **Base Model:** [google/t5-small](https://huggingface.co/google/t5-small)
+- **Model Type:** Transformer-based Text-to-Text Generation Model
+- **Fine-tuned on:** Legislative bill texts
+### Model Description
+This model leverages the T5 (Text-to-Text Transfer Transformer) architecture, which treats all NLP tasks as text-to-text tasks, enabling it to handle a wide range of natural language understanding and generation tasks. The T5-small version is a smaller variant of the T5 model, making it more computationally efficient while still delivering reasonable performance. This fine-tuned model is specifically trained to summarize legislative bills, capturing essential details and providing concise summaries.
+### Intended Uses & Limitations
+**Intended Uses:**
+- Summarizing legislative bills and related legal documents.
+- Extracting key information from long legal texts.
+- Assisting in the quick review of bill content for policymakers, legal professionals, and researchers.
+**Limitations:**
+- The model may not capture all nuances of highly complex legal language.
+- It may omit important details if they are not prevalent in the training data.
+- It is not designed for tasks outside summarization of legislative content.
+- The quality of summaries depends on the quality and relevance of the input data.
+### Training and Evaluation Data
+The model was fine-tuned using a dataset derived from legislative bills. The specific dataset used for training is not explicitly mentioned, but it likely consists of publicly available legislative texts. The evaluation metrics (Rouge scores) indicate the model's performance on generating summaries.
+### Evaluation Results
+The model achieved the following results on the evaluation set:
+- **Loss:** 2.5533
+- **ROUGE-1:** 0.1356
+- **ROUGE-2:** 0.0495
+- **ROUGE-L:** 0.1144
+- **ROUGE-Lsum:** 0.1144
+- **Generated Summary Length (Gen Len):** 19.0
+These scores suggest moderate summarization performance, with room for improvement in capturing more comprehensive content.
+### Training Procedure
+The model was trained using the following hyperparameters and setup:
+#### Training Hyperparameters
+- **Learning Rate:** 2e-05
+- **Training Batch Size:** 16
+- **Evaluation Batch Size:** 16
+- **Random Seed:** 42
+- **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08)
+- **Learning Rate Scheduler:** Linear
+- **Number of Epochs:** 3
+- **Mixed Precision Training:** Native AMP (Automatic Mixed Precision)
+#### Training Results
+| Training Loss | Epoch | Step | Validation Loss | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | Gen Len |
+|:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:----------:|:-------:|
+| No log        | 1.0   | 62   | 2.6711          | 0.1308  | 0.0445  | 0.1107  | 0.1109     | 19.0    |
+| No log        | 2.0   | 124  | 2.5761          | 0.1338  | 0.0483  | 0.1137  | 0.1137     | 19.0    |
+| No log        | 3.0   | 186  | 2.5533          | 0.1356  | 0.0495  | 0.1144  | 0.1144     | 19.0    |
+### Framework Versions
+- **Transformers:** 4.42.4
+- **PyTorch:** 2.3.1+cu121
+- **Datasets:** 2.21.0
+- **Tokenizers:** 0.19.1
+### Ethical Considerations
+- **Bias:** The model's summaries might reflect biases present in the training data, potentially affecting the representation of different topics or perspectives.
+- **Data Privacy:** Ensure that the use of the model complies with data privacy regulations, especially when using it on sensitive or proprietary legislative documents.
+### Future Improvements
+- Training on a larger and more diverse dataset of legislative texts could improve summarization quality.
+- Fine-tuning further with domain-specific data may help capture nuanced legal language better.
+- Incorporating additional evaluation metrics like BERTScore can provide a more comprehensive understanding of the model's performance.
+### Usage
+You can use this model in a Hugging Face pipeline for various text-to-text tasks:
+```
+from transformers import pipeline
+translator = pipeline(
+    "summarization",
+    model="ashaduzzaman/t5-small-finetuned-billsum"
+)
+# Example usage: Summarization
+input_text = "This is a long passage from a book that needs to be summarized."
+summary = generator(input_text)
+print(summary)
+```