| Literature DB >> 21871110 |
Laura Plaza1, Antonio J Jimeno-Yepes, Alberto Díaz, Alan R Aronson.
Abstract
BACKGROUND: Word sense disambiguation (WSD) attempts to solve lexical ambiguities by identifying the correct meaning of a word based on its context. WSD has been demonstrated to be an important step in knowledge-based approaches to automatic summarization. However, the correlation between the accuracy of the WSD methods and the summarization performance has never been studied.Entities:
Mesh:
Year: 2011 PMID: 21871110 PMCID: PMC3176269 DOI: 10.1186/1471-2105-12-355
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example query for term repair used in the AEC method. This example shows the two queries generated for each one of the candidate senses of the term repair. The first sense, with CUI C0374711, is related to surgical repair while the second sense, with CUI C0043240, is related to wound healing.
NLM WSD results: method comparison
| WSD Method | Set | Subset |
|---|---|---|
| MRD | 0.6389 | 0.6526 |
| AEC | 0.6836 | 0.6932 |
| JDI |
MRD stands for Machine Readable Dictionary, AEC stands for Automatic Extracted Corpus and JDI stands for Journal Descriptor Indexing. The term set stands for all the ambiguous words in the NLM WSD test set while subset indicates that only the words usable by the JDI method are considered. This means, the words with different semantic types.
MSH WSD results: method comparison
| Dataset | AEC | JDI | MRD |
|---|---|---|---|
| Abbreviation Set | 0.8759 | ||
| Abbreviation Subset | 0.6725 | 0.8838 | |
| Term Set | 0.7148 | ||
| Term Subset | 0.6209 | 0.7132 | |
| Term/Abbreviation Set | 0.8801 | ||
| Term/Abbreviation Subset | 0.6899 | 0.8715 | |
| Overall Set | 0.8070 | ||
| Overall Subset | 0.6551 | 0.8118 | |
AEC stands for Automatic Extracted Corpus, MRD stands for Machine Readable dictionary, and JDI stands for Journal Descriptor Indexing. The term set stands for all the ambiguous words in the category while subset indicates that only the words usable by the JDI method are considered.
ROUGE scores for the summaries generated using different WSD strategies
| Summarizer | ROUGE-2 | ROUGE-SU4 |
|---|---|---|
| AEC | ||
| MRD | 0.3611 | 0.3341 |
| JDI | 0.3538 | 0.3267 |
| First mapping | 0.3283 | 0.3117 |
MRD stands for Machine Readable Dictionary, AEC stands for Automatic Extracted Corpus and JDI stands for Journal Descriptor Indexing. Systems are sorted by decreasing ROUGE-2 score.
p values for statistical significance (Wilcoxon Signed Ranks Test)
| Summarizer | ROUGE-2 | ROUGE-SU4 |
|---|---|---|
| MRD-AEC | 0.187 | 0.341 |
| JDI-AEC | 0.013 | 0.058 |
| JDI-MRD | 0.057 | 0.084 |
MRD stands for Machine Readable Dictionary, AEC stands for Automatic Extracted Corpus and JDI stands for Journal Descriptor Indexing.
Figure 2Comparison between the disambiguation performed by different WSD algorithms and the ROUGE-2 scores obtained by the summaries generated using each WSD algorithm. Disambiguation performance is represented in terms of the proportion of common mappings between each pair of WSD algorithms, while their performance in the summarization task is pictured in terms of the difference in the ROUGE-2 scores achieved by the summaries generated. Each data point in these graphs represent a different document from the evaluation corpus. MRD stands for Machine Readable Dictionary, AEC stands for Automatic Extracted Corpus and JDI stands for Journal Descriptor Indexing.