| Literature DB >> 19344480 |
Weisi Duan1, Min Song, Alexander Yates.
Abstract
BACKGROUND: We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort.Entities:
Mesh:
Year: 2009 PMID: 19344480 PMCID: PMC2665052 DOI: 10.1186/1471-2105-10-S3-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1S. SENSATIONAL's clustering algorithm is a fast, approximate technique for finding maximum-margin clusters in document collections. It uses minimum spanning trees to find high-weight edges connecting two components of the graph.
Figure 2A minimum spanning tree with slot annotations. Our technique for handling outliers performs a breadth-first traversal of the minimum spanning tree in order to count the number of nodes in each subtrees.
Comparison with a supervised WSD system.
| Keyword | All-in-1 | L & R | S |
| adjustment | 0.62 | 0.57 | 0.56 |
| blood pressure | 0.54 | 0.46 | 0.49 |
| degree | 0.63 | 0.68 | 0.72 |
| evaluation | 0.50 | 0.57 | 0.57 |
| growth | 0.63 | 0.62 | 0.71 |
| immunosuppression | 0.59 | 0.63 | 0.59 |
| man | 0.58 | 0.80 | 0.51 |
| mosaic | 0.52 | 0.66 | 0.71 |
| nutrition | 0.45 | 0.48 | 0.42 |
| radiation | 0.61 | 0.72 | 0.65 |
| repair | 0.52 | 0.81 | 0.80 |
| scale | 0.65 | 0.84 | 0.68 |
| sensitivity | 0.49 | 0.70 | 0.74 |
| weight | 0.47 | 0.68 | 0.53 |
| white | 0.49 | 0.62 | 0.52 |
| average | 0.55 | 0.66 | 0.61 |
SENSATIONAL outperforms the baseline All-in-1 system by 6% on average, and comes within 5% of a supervised system for word sense disambiguation by Leroy & Rindflesch (L&R) on a standard NLM dataset. Average performance across all words is statistically significant using the Chi-square test with Yates' correction (two-tailed test with 1 degree of freedom; between All-in-1 and SENSATIONAL, Chi-square = 10.839, p = 0.0010; between SENSATIONAL and L&R, Chi-square = 7.875, p = 0.0050).
Comparison with two unsupervised WSD systems.
| Keyword | All-in-1 | SenseClusters | S |
| ANA | 0.63 | 0.99 | 1.0 |
| BPD | 0.40 | 0.65 | 0.53 |
| BSA | 0.50 | 0.99 | 0.95 |
| CML | 0.55 | 0.99 | 0.90 |
| cold | 0.37 | 0.63 | 0.67 |
| culture | 0.52 | 0.55 | 0.82 |
| discharge | 0.66 | 0.90 | 0.95 |
| fat | 0.51 | 0.55 | 0.53 |
| fluid | 0.64 | 0.88 | 0.99 |
| glucose | 0.51 | 0.69 | 0.51 |
| inflammation | 0.35 | 0.47 | 0.50 |
| inhibition | 0.50 | 0.55 | 0.54 |
| MAS | 0.50 | 1.0 | 1.0 |
| mole | 0.78 | 0.77 | 0.96 |
| nutrition | 0.39 | 0.5 | 0.55 |
| pressure | 0.52 | 0.89 | 0.86 |
| single | 0.50 | 0.87 | 0.99 |
| transport | 0.51 | 0.52 | 0.57 |
| VCR | 0.79 | 0.65 | 0.64 |
| average | 0.53 | 0.74 | 0.76 |
SENSATIONAL outperforms the All-in-1 baseline by an impressive 23% and the SenseClusters system by 2% on a task of disambiguating terms in PubMed abstracts. Differences in performance across all words is statistically significant using the Chi-squared test with Yates' correction (two-tailed test with 1 degree of freedom; between All-in-1 and SenseClusters, Chi-square = 488.671, p < 0.0001; between SenseClusters and SENSATIONAL, Chi-square = 5.388, p = 0.0203).