phanerozoic
commited on
Commit
•
9a24e7d
1
Parent(s):
9472389
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,79 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- bart
|
7 |
+
- text-summarization
|
8 |
+
- cnn-dailymail
|
9 |
+
widget:
|
10 |
+
- text: |
|
11 |
+
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
|
12 |
+
example_title: Generate Summary
|
13 |
+
metrics:
|
14 |
+
- rouge
|
15 |
+
datasets:
|
16 |
+
- cnn_dailymail
|
17 |
+
---
|
18 |
+
|
19 |
+
# BART-Large-CNN-Enhanced
|
20 |
+
|
21 |
+
The BART-Large-CNN-Enhanced is a fine-tuned version of the `facebook/bart-large-cnn` model. It has been optimized on the CNN/DailyMail dataset, achieving a 5% overall improvement in ROUGE scores compared to the base model.
|
22 |
+
|
23 |
+
- **Developed by**: phanerozoic
|
24 |
+
- **Model type**: BartForConditionalGeneration
|
25 |
+
- **Source model**: `facebook/bart-large-cnn`
|
26 |
+
- **License**: cc-by-nc-4.0
|
27 |
+
- **Languages**: English
|
28 |
+
|
29 |
+
## Model Details
|
30 |
+
|
31 |
+
BART-Large-CNN-Enhanced utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by further refining its ability to understand and generate human-like summaries.
|
32 |
+
|
33 |
+
### Configuration
|
34 |
+
- **Max input length**: 1024 tokens
|
35 |
+
- **Max target length**: 128 tokens
|
36 |
+
- **Learning rate**: 1e-5
|
37 |
+
- **Batch size**: 32
|
38 |
+
- **Epochs**: 1
|
39 |
+
- **Hardware used**: NVIDIA RTX 6000 Ada Lovelace
|
40 |
+
|
41 |
+
## Training and Evaluation Data
|
42 |
+
|
43 |
+
The model was re-trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.
|
44 |
+
|
45 |
+
## Training Procedure
|
46 |
+
|
47 |
+
The training involved fine-tuning the pre-trained `facebook/bart-large-cnn` model with the following settings:
|
48 |
+
- **Epochs**: 1
|
49 |
+
- **Batch size**: 32
|
50 |
+
- **Learning rate**: 1e-5
|
51 |
+
- **Training time**: 7 hours 19 minutes 24 seconds
|
52 |
+
- **Loss**: 0.618
|
53 |
+
|
54 |
+
During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.
|
55 |
+
|
56 |
+
### Performance
|
57 |
+
The fine-tuning process resulted in significant performance improvements:
|
58 |
+
- **ROUGE-1**: 45.37 (5.62% improvement over the base model score of 42.949)
|
59 |
+
- **ROUGE-2**: 22.00 (5.71% improvement over the base model score of 20.815)
|
60 |
+
- **ROUGE-L**: 31.17 (1.80% improvement over the base model score of 30.619)
|
61 |
+
|
62 |
+
These scores reflect the model’s enhanced ability to capture the key elements of the source text and produce coherent summaries that are faithful to the original content.
|
63 |
+
|
64 |
+
### Overall Appraisal
|
65 |
+
The BART-Large-CNN-Enhanced model demonstrates remarkable improvements and solidifies its position as a robust tool for text summarization. Here are the key points of its appraisal:
|
66 |
+
|
67 |
+
- **Standard Performance**: The model excels in generating summaries for news articles, achieving significantly improved ROUGE scores compared to the base model. Its ability to distill lengthy articles into concise and coherent summaries while preserving the essential information makes it particularly valuable for applications such as news aggregation and content curation.
|
68 |
+
|
69 |
+
## Usage
|
70 |
+
|
71 |
+
This model is highly effective for generating summaries in English texts, particularly in contexts similar to the news articles dataset upon which the model was trained. It can be used in various applications, including news aggregation, content summarization, and information retrieval.
|
72 |
+
|
73 |
+
## Limitations
|
74 |
+
|
75 |
+
While the model excels in contexts similar to its training data (news articles), its performance might vary on text from other domains or in other languages. Future enhancements could involve expanding the training data to include more diverse text sources, which would improve its generalizability and robustness.
|
76 |
+
|
77 |
+
## Acknowledgments
|
78 |
+
|
79 |
+
Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.
|