| Literature DB >> 35186058 |
Divakar Yadav1, Naman Lalit1, Riya Kaushik1, Yogendra Singh1, Arun Kr Yadav1, Kishor V Bhadane2, Adarsh Kumar3, Baseem Khan4.
Abstract
For the better utilization of the enormous amount of data available to us on the Internet and in different archives, summarization is a valuable method. Manual summarization by experts is an almost impossible and time-consuming activity. People could not access, read, or use such a big pile of information for their needs. Therefore, summary generation is essential and beneficial in the current scenario. This paper presents an efficient qualitative analysis of the different algorithms used for text summarization. We implemented five different algorithms, namely, term frequency-inverse document frequency (TF-IDF), LexRank, TextRank, BertSum, and PEGASUS, for a summary generation. These algorithms are chosen based on various factors. After reviewing the state-of-the-art literature, it generates good summaries results. The performance of these algorithms is compared on two different datasets, i.e., Reddit-TIFU and MultiNews, and their results are measured using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure to perform analysis to decide the best algorithm among these and generate the summary. After performing a qualitative analysis of the above algorithms, we observe that for both the datasets, i.e., Reddit-TIFU and MultiNews, PEGASUS had the best average F-score for abstractive text summarization and TextRank algorithms for extractive text summarization, with a better average F-score.Entities:
Mesh:
Year: 2022 PMID: 35186058 PMCID: PMC8849812 DOI: 10.1155/2022/3411881
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Classifications of automatic text summarization methods.
Figure 2Abstractive text summarization techniques.
Figure 3TF-IDF technique.
Figure 4Unsupervised graph [18].
Figure 5Transformer encoder-decoder model [28].
ROUGE metrics for MultiNews dataset.
| No. | Algorithm | Rog-1-f | Rog-1-p | Rog-1-r | Rog-2-f | Rog-2-p | Rog-2-r | Rog-l-f | Rog-l-p | Rog-l-r |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TF-IDF | 0.2971 | 0.35273 | 0.25663 | 0.0821 | 0.0987 | 0.0703 | 0.2495 | 0.2849 | 0.222 |
| 2 | LexRank | 0.2941 | 0.4203 | 0.22619 | 0.0765 | 0.1077 | 0.0593 | 0.2307 | 0.3306 | 0.1772 |
| 3 | BertSum | 0.2584 | 0.42442 | 0.18581 | 0.0745 | 0.1325 | 0.0519 | 0.2268 | 0.3501 | 0.1678 |
| 4 | TextRank | 0.5948 | 0.60544 | 0.58456 | 0.1112 | 0.0736 | 0.2276 | 0.2828 | 0.2041 | 0.4605 |
| 5 | PEGASUS | 0.438 | 0.49796 | 0.39095 | 0.1998 | 0.2261 | 0.179 | 0.3734 | 0.4296 | 0.3303 |
Figure 6Comperision of results of summarization algorithms on MultiNews dataset.
ROUGE metrics for REDDIT-TIFU dataset.
| No. | Algorithm | Rog-1-f | Rog-1-p | Rog-1-r | Rog-2-f | Rog-2-p | Rog-2-r | Rog-l-f | Rog-l-p | Rog-l-r |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TF-IDF | 0.2095 | 0.1819 | 0.4208 | 0.1251 | 0.1578 | 0.1839 | 0.1282 | 0.1835 | 0.3525 |
| 2 | LexRank | 0.2199 | 0.1183 | 0.3312 | 0.1275 | 0.2034 | 0.1806 | 0.1442 | 0.1709 | 0.2713 |
| 3 | BertSum | 0.2261 | 0.1165 | 0.3887 | 0.1209 | 0.1263 | 0.1832 | 0.1356 | 0.1905 | 0.3362 |
| 4 | TextRank | 0.2159 | 0.1098 | 0.5056 | 0.1258 | 0.1555 | 0.2215 | 0.1258 | 0.1784 | 0.4279 |
| 5 | PEGASUS | 0.2376 | 0.2139 | 0.3293 | 0.1845 | 0.2766 | 0.2023 | 0.2175 | 0.1974 | 0.3846 |
Figure 7Comperision of results of summarization algorithms on REDDIT-TIFU dataset.
Precision, recall, and F-measure of algorithms.
| Algorithm |
| Recall | Precision |
|---|---|---|---|
| TextRank | 0.133 | 0.085 | 0.382 |
| LexRank | 0.19 | 0.148 | 0.331 |