phanerozoic commited on
Commit
9a24e7d
1 Parent(s): 9472389

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - bart
7
+ - text-summarization
8
+ - cnn-dailymail
9
+ widget:
10
+ - text: |
11
+ The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
12
+ example_title: Generate Summary
13
+ metrics:
14
+ - rouge
15
+ datasets:
16
+ - cnn_dailymail
17
+ ---
18
+
19
+ # BART-Large-CNN-Enhanced
20
+
21
+ The BART-Large-CNN-Enhanced is a fine-tuned version of the `facebook/bart-large-cnn` model. It has been optimized on the CNN/DailyMail dataset, achieving a 5% overall improvement in ROUGE scores compared to the base model.
22
+
23
+ - **Developed by**: phanerozoic
24
+ - **Model type**: BartForConditionalGeneration
25
+ - **Source model**: `facebook/bart-large-cnn`
26
+ - **License**: cc-by-nc-4.0
27
+ - **Languages**: English
28
+
29
+ ## Model Details
30
+
31
+ BART-Large-CNN-Enhanced utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by further refining its ability to understand and generate human-like summaries.
32
+
33
+ ### Configuration
34
+ - **Max input length**: 1024 tokens
35
+ - **Max target length**: 128 tokens
36
+ - **Learning rate**: 1e-5
37
+ - **Batch size**: 32
38
+ - **Epochs**: 1
39
+ - **Hardware used**: NVIDIA RTX 6000 Ada Lovelace
40
+
41
+ ## Training and Evaluation Data
42
+
43
+ The model was re-trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.
44
+
45
+ ## Training Procedure
46
+
47
+ The training involved fine-tuning the pre-trained `facebook/bart-large-cnn` model with the following settings:
48
+ - **Epochs**: 1
49
+ - **Batch size**: 32
50
+ - **Learning rate**: 1e-5
51
+ - **Training time**: 7 hours 19 minutes 24 seconds
52
+ - **Loss**: 0.618
53
+
54
+ During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.
55
+
56
+ ### Performance
57
+ The fine-tuning process resulted in significant performance improvements:
58
+ - **ROUGE-1**: 45.37 (5.62% improvement over the base model score of 42.949)
59
+ - **ROUGE-2**: 22.00 (5.71% improvement over the base model score of 20.815)
60
+ - **ROUGE-L**: 31.17 (1.80% improvement over the base model score of 30.619)
61
+
62
+ These scores reflect the model’s enhanced ability to capture the key elements of the source text and produce coherent summaries that are faithful to the original content.
63
+
64
+ ### Overall Appraisal
65
+ The BART-Large-CNN-Enhanced model demonstrates remarkable improvements and solidifies its position as a robust tool for text summarization. Here are the key points of its appraisal:
66
+
67
+ - **Standard Performance**: The model excels in generating summaries for news articles, achieving significantly improved ROUGE scores compared to the base model. Its ability to distill lengthy articles into concise and coherent summaries while preserving the essential information makes it particularly valuable for applications such as news aggregation and content curation.
68
+
69
+ ## Usage
70
+
71
+ This model is highly effective for generating summaries in English texts, particularly in contexts similar to the news articles dataset upon which the model was trained. It can be used in various applications, including news aggregation, content summarization, and information retrieval.
72
+
73
+ ## Limitations
74
+
75
+ While the model excels in contexts similar to its training data (news articles), its performance might vary on text from other domains or in other languages. Future enhancements could involve expanding the training data to include more diverse text sources, which would improve its generalizability and robustness.
76
+
77
+ ## Acknowledgments
78
+
79
+ Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.