BART-Large-CNN-Enhanced
The BART-Large-CNN-Enhanced is a fine-tuned version of the facebook/bart-large-cnn
model. It has been optimized on the CNN/DailyMail dataset, achieving a 5% overall improvement in ROUGE scores compared to the base model.
- Developed by: phanerozoic
- Model type: BartForConditionalGeneration
- Source model:
facebook/bart-large-cnn
- License: cc-by-nc-4.0
- Languages: English
Model Details
BART-Large-CNN-Enhanced utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by further refining its ability to understand and generate human-like summaries.
Configuration
- Max input length: 1024 tokens
- Max target length: 128 tokens
- Learning rate: 1e-5
- Batch size: 32
- Epochs: 1
- Hardware used: NVIDIA RTX 6000 Ada Lovelace
Training and Evaluation Data
The model was re-trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.
Training Procedure
The training involved fine-tuning the pre-trained facebook/bart-large-cnn
model with the following settings:
- Epochs: 1
- Batch size: 32
- Learning rate: 1e-5
- Training time: 7 hours 19 minutes 24 seconds
- Loss: 0.618
During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.
Performance
The fine-tuning process resulted in significant performance improvements:
- ROUGE-1: 45.37 (5.62% improvement over the base model score of 42.949)
- ROUGE-2: 22.00 (5.71% improvement over the base model score of 20.815)
- ROUGE-L: 31.17 (1.80% improvement over the base model score of 30.619)
These scores reflect the model’s enhanced ability to capture the key elements of the source text and produce coherent summaries that are faithful to the original content.
Comparing Performance to Base Model
To illustrate the improvements made by the BART-Large-CNN-Enhanced model, we used the same article featured on the base model's widget, allowing for a direct summary comparison. The article describes the Eiffel Tower, its dimensions, and its historical significance. Below are the summaries generated by both models:
Given Article
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
Summary by BART-Large-CNN-Enhanced
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.
Summary by Facebook's BART-Large-CNN
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.
Analysis
Coverage:
- Enhanced Model: Includes key details such as the Eiffel Tower being the tallest structure in Paris and its historical significance in surpassing the Washington Monument.
- Base Model: Provides additional details about the base dimensions but omits the detail about the Eiffel Tower being the tallest structure in Paris.
Conciseness:
- Enhanced Model: More concise, focusing on the most critical historical and current facts.
- Base Model: Slightly longer, with extra details about the base dimensions.
Relevance:
- Enhanced Model: Captures the most relevant details, making it more informative for someone looking for key highlights.
- Base Model: Adds context with base dimensions, which might be less critical depending on the summary's intended use.
This comparison highlights the BART-Large-CNN-Enhanced model's improved ability to generate more concise and relevant summaries by focusing on significant details, such as the Eiffel Tower being the tallest structure in Paris, which the base model missed. This makes the enhanced model more effective for generating high-impact summaries for users seeking essential information quickly.
Overall Appraisal
The BART-Large-CNN-Enhanced model demonstrates remarkable improvements and solidifies its position as a robust tool for text summarization. Here are the key points of its appraisal:
- Standard Performance: The model excels in generating summaries for news articles, achieving significantly improved ROUGE scores compared to the base model. Its ability to distill lengthy articles into concise and coherent summaries while preserving the essential information makes it particularly valuable for applications such as news aggregation and content curation.
Usage
This model is highly effective for generating summaries in English texts, particularly in contexts similar to the news articles dataset upon which the model was trained. It can be used in various applications, including news aggregation, content summarization, and information retrieval.
Limitations
While the model excels in contexts similar to its training data (news articles), its performance might vary on text from other domains or in other languages. Future enhancements could involve expanding the training data to include more diverse text sources, which would improve its generalizability and robustness.
Acknowledgments
Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.
- Downloads last month
- 24
Dataset used to train phanerozoic/BART-Large-CNN-Enhanced
Collection including phanerozoic/BART-Large-CNN-Enhanced
Evaluation results
- ROUGE-1 on CNN/DailyMailInternal Evaluation45.370
- ROUGE-2 on CNN/DailyMailInternal Evaluation22.000
- ROUGE-L on CNN/DailyMailInternal Evaluation31.170