| Literature DB >> 23445074 |
Laura Plaza1, Jorge Carrillo-de-Albornoz.
Abstract
BACKGROUND: The position of a sentence in a document has been traditionally considered an indicator of the relevance of the sentence, and therefore it is frequently used by automatic summarization systems as an attribute for sentence selection. Sentences close to the beginning of the document are supposed to deal with the main topic and thus are selected for the summary. This criterion has shown to be very effective when summarizing some types of documents, such as news items. However, this property is not likely to be found in other types of documents, such as scientific articles, where other positional criteria may be preferred. The purpose of the present work is to study the utility of different positional strategies for biomedical literature summarization.Entities:
Mesh:
Year: 2013 PMID: 23445074 PMCID: PMC3648362 DOI: 10.1186/1471-2105-14-71
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Architecture of the graph-based summarization system. The figure illustrates the different steps in the summarization process: (i) concept identification, (ii) document representation, (iii) topic recognition and (iv) sentence selection.
Figure 2Example document graph. Dashed black lines represent hypernymy relations; red lines represent Metathesaurus relations; and blue lines represent Semantic Network relations.
ROUGE scores for the summaries generated using the strategy
| | |||||
|---|---|---|---|---|---|
| 1.0 | 0.0 | 0,1660 | 0,1334 | 0,1375 | 0,1096 |
| 0.9 | 0.1 | 0,1502 | 0,1223 | ||
| 0.8 | 0.2 | 0,1668 | 0,1357 | ||
| 0.75 | 0.25 | 0,1592 | 0,1315 | 0,1511 | 0,1205 |
Significance is calculated with respect to the non-positional information baseline (), and shown using the following convention: * = p <.05 and no star indicates non-significance. The best scores per summarizer are shown in bold.
ROUGE scores for the summaries generated using the strategy
| | |||||
|---|---|---|---|---|---|
| 1.0 | 0.0 | 0,1375 | 0,1096 | ||
| 0.9 | 0.1 | 0,1610 | 0,1305 | ||
| 0.8 | 0.2 | 0,1572 | 0,1298 | 0,1498 | 0,1220 |
| 0.75 | 0.25 | 0,1546 | 0,1259 | 0,1436 | 0,1182 |
Significance is calculated with respect to the non-positional information baseline (), and shown using the following convention: * = p <.05 and no star indicates non-significance. The best scores per summarizer are shown in bold.
ROUGE scores for the summaries generated using the strategy
| 1.0 | 0.0 | - | - | - | - | - | 0,1660 | 0,1334 | 0,1375 | 0,1096 |
| | | 0.2 | 0.1 | 1.0 | 0.4 | 0.2 | 0,1635 | 0,1341 | 0,1395 | 0,1134 |
| 0.9 | 0.1 | 0.2 | 0.0 | 0.8 | 0.6 | 0.1 | 0,1752 | 0,1483 | 0,1402 | 0,1159 |
| | | 0.2 | 0.0 | 1.0 | 1.0 | 0.1 | ||||
| | | 0.2 | 0.0 | 1.0 | 0.8 | 0.0 | 0,1758 | 0,1490 | 0,1423 | 0,1178 |
| | | 0.2 | 0.1 | 1.0 | 0.4 | 0.2 | 0,1726 | 0,1489 | 0,1546 | 0,13254 |
| 0.8 | 0.2 | 0.2 | 0.0 | 0.8 | 0.6 | 0.1 | 0,1758 | 0,1514 | 0,1589 | 0,1332 |
| | | 0.2 | 0.0 | 1.0 | 1.0 | 0.1 | ||||
| | | 0.2 | 0.0 | 1.0 | 0.8 | 0.0 | 0,1846* | 0,1526* | 0,1610* | 0,1314* |
| | | 0.2 | 0.1 | 1.0 | 0.4 | 0.2 | 0,1613 | 0,1333 | 0,1583* | 0,1298* |
| 0.75 | 0.25 | 0.2 | 0.0 | 0.8 | 0.6 | 0.1 | 0,1634 | 0,1348 | 0,1598* | 0,1302* |
| | | 0.2 | 0.0 | 1.0 | 1.0 | 0.1 | ||||
| 0.2 | 0.0 | 1.0 | 0.8 | 0.0 | 0,1688 | 0,1353 | 0,1604 | 0,1306 | ||
Significance is calculated with respect to the non-positional information baseline (), and shown using the following convention: * = p <.05 and no star indicates non-significance. The best scores per summarizer are shown in bold.
Comparison of summarization approaches
| Graph-based | 0,1660 | 0,1334 |
| Graph-based + | 0,1744 | 0,1492 |
| Graph-based + | 0,1610 | 0,1305 |
| Graph-based + | ||
| Frequency-based | 0,1375 | 0,1096 |
| Frequency-based + | 0,1574 | 0,1290 |
| Frequency-based + | 0,1503 | 0,1223 |
| Frequency-based + | 0,1352 | |
| LexRank |
ROUGE results for different summarization approaches. The best scores per summarizer are shown in bold.