| Literature DB >> 22666174 |
Hisham Al-Mubaid1, Sandeep Gungu.
Abstract
In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques.Entities:
Mesh:
Year: 2012 PMID: 22666174 PMCID: PMC3361294 DOI: 10.1100/2012/949247
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
An example of a training corpus of 10 instances of an ambiguous word w where 5 instances are in the first sense listed under class label C 1 and 5 instances of the second sense listed under class C 2. The context word w has 4 and 1 occurrences in Class C 1 and C 2, respectively, while w has 3 and 2 occurrences in C 1 and C 2, respectively.
|
|
|
|---|---|
| ⋯ | ⋯ |
| ⋯ | ⋯ |
| ⋯ | ⋯ |
| ⋯ | ⋯ |
| ⋯ | ⋯ |
Context words with the top MI values for the ambiguous word “cold”.
| Context words | |
|---|---|
| Import | |
| Understand | |
| Ischemia | |
| Reperfus | |
| Respons | |
| Stor | |
| Arteri | |
| Attempt | |
| Repress | |
| Quantit |
Accuracy results of the first evaluation, EV1, where each sense has to have at least two instances tagged with it.
| Accuracy | |
|---|---|
| Fold 1 | 0.912 |
| Fold 2 | 0.931 |
| Fold 3 | 0.917 |
| Fold 4 | 0.897 |
| Fold 5 | 0.862 |
|
| |
|
|
|
Detailed accuracy results of three evaluations EV1, EV2, and EV3.
| Word | Baseline (mfs) | EV1 | EV2 | EV3 |
|---|---|---|---|---|
| Adjustment | 0.67 | 0.99 | 0.96 | 0.93 |
| Blood_Pressure | 0.54 | 0.98 | 0.80 | 0.83 |
| Cold | 0.91 | 0.94 | 0.92 | 0.95 |
| Condition | 0.98 | 0.95 | 0.95 | 0.95 |
| Culture | 0.89 | 0.87 | 0.96 | 0.94 |
| Degree | 0.97 | 0.93 | 0.93 | 0.93 |
| Evaluation | 0.50 | 0.98 | 0.82 | 0.85 |
| Extraction | 0.94 | 0.94 | 0.93 | 0.94 |
| Failure | 0.86 | 0.83 | 0.83 | 0.83 |
| Fat | 0.97 | 0.93 | 0.93 | 0.93 |
| Ganglion | 0.93 | 0.93 | 0.91 | 0.93 |
| Glucose | 0.91 | 0.90 | 0.90 | 0.93 |
| Growth | 0.63 | 0.92 | 1.00 | 0.96 |
| Immune Suppression | 0.59 | 0.98 | 0.88 | 0.87 |
| Implantation | 0.83 | 0.91 | 0.96 | 0.87 |
| Japanese | 0.92 | 0.92 | 0.97 | 0.92 |
| Lead | 0.93 | 0.84 | 0.84 | 0.84 |
| Man | 0.63 | 0.98 | 0.90 | 0.92 |
| Mosaic | 0.54 | 0.99 | 0.77 | 0.87 |
| Nutrition | 0.51 | 0.94 | 0.70 | 0.88 |
| Pathology | 0.86 | 0.79 | 0.96 | 0.92 |
| Radiation | 0.62 | 0.83 | 0.93 | 0.89 |
| Reduction | 0.82 | 0.63 | 0.63 | 0.63 |
| Repair | 0.76 | 0.92 | 0.91 | 0.96 |
| Sex | 0.80 | 0.94 | 0.97 | 0.88 |
| Support | 0.80 | 0.67 | 0.67 | 0.67 |
| Surgery | 0.98 | 0.95 | 0.95 | 0.95 |
| Ultrasound | 0.84 | 0.93 | 0.93 | 0.91 |
| Variation | 0.80 | 0.86 | 0.94 | 0.89 |
| Weight | 0.55 | 0.83 | 0.57 | 0.85 |
| White | 0.54 | 1.00 | 0.69 | 0.77 |
|
| ||||
|
|
|
|
|
|
Comparison of our results with the best reported results from recent reported techniques.
| Word | Baseline (mfs) | Previous Results | Our method (EV1) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Stevenson et al. [ | Agirre et al. [ |
Jimeno-Yepes and Aronson [ | |||||||||
| Joshi-2005 | McInnes 2007 | Stevenson-2008 | Single | Subset | Full | NB | CombSW | CombV | |||
| Adjustment | 67 | 71 | 70 | 74 | 33.3 | 35.5 | 76.3 | 69 | 53.9 | 99 | |
| Blood pressure | 54 | 53 | 46 | 46 | 53.0 | 50 | 48 | 57.0 | 38 | 44 | 98 |
| Cold | 91 | 90 | 89 | 88 | 32.6 | 26.3 | 28.4 | 92.6 | 39 | 79 | 94 |
| Condition | 98 | — | 89 | 89 | 95.7 | 39.1 | 48.9 | 97.8 | 78 | 69 | 95 |
| Culture | 89 | — | 94 | 95 | 33 | 77 | 93.0 | 100 | 54 | 87 | |
| Degree | 97 | 89 | 79 | 95 | 95.4 | 93.8 | 96.9 | 88 | 82 | 93 | |
| Evaluation | 50 | 69 | 73 | 81 | 59 | 54 | 50 | 78.0 | 52 | 50 | 98 |
| Extraction | 94 | 84 | 86 | 85 | 23 | 27.6 | 94.3 | 98 | 86 | 94 | |
| Failure | 86 | — | 73 | 67 | 27.6 | 72.4 | 86.2 | 86 | 100 | 83 | |
| Fat | 97 | 84 | 77 | 84 | 56.2 | 63 | 95.9 | 97.3 | 91 | 84 | 93 |
| Ganglion | 93 | — | 94 | 96 | 66 | 77 | 64 | 95.0 | 88 | 86 | 93 |
| Glucose | 91 | — | 90 | 91 | 91 | 91 | 90 | 91.0 | 78 | 39 | 90 |
| Growth | 63 | 71 | 69 | 68 | 37 | 37 | 37 | 73.0 | 55 | 66 | 92 |
| Immune suppression | 59 | 80 | 75 | 80 | 64 | 59 | 62 | 79.0 | 60 | 65 | 98 |
| Implantation | 83 | 94 | 92 | 93 | 75 | 84.7 | 84.7 | 98.0 | 94 | 97 | 91 |
| Japanese | 92 | 77 | 76 | 75 | 70.9 | 70.9 | 64.6 | 92.4 | 63 | 94 | 92 |
| Lead | 93 | 89 | 90 | 94 | 93.1 | 93.1 | 93.1 | 93.1 | 83 | 86 | 84 |
| Man | 63 | 89 | 80 | 90 | 61.5 | 34.8 | 44.6 | 87.0 | 65 | 42 | 98 |
| Mosaic | 54 | 87 | 75 | 87 | 60.8 | 66 | 82.5 | 84 | 72 | 99 | |
| Nutrition | 51 | 52 | 49 | 54 | 33.7 | 32.6 | 55.1 | 45 | 43 | 94 | |
| Pathology | 86 | 85 | 84 | 85 | 34.3 | 28.3 | 85.9 | 76 | 83 | 79 | |
| Radiation | 62 | 82 | 81 | 84 | 58.2 | 53.1 | 53.1 | 83.7 | 76 | 76 | 82 |
| Reduction | 82 | 91 | 92 | 89 | 36.4 | 54.5 | 54.5 | 81.8 | 100 | 82 | 63 |
| Repair | 76 | 87 | 93 | 88 | 63.2 | 72.1 | 76.5 | 95.6 | 87 | 88 | 92 |
| Sex | 80 | 88 | 87 | 87 | 84 | 85 | 85 | 84.0 | 60 | 53 | 94 |
| Support | 80 | — | 91 | 89 | 80 | 80 | 80 | 80.0 | 100 | 90 | 67 |
| Surgery | 98 | — | 94 | 97 | 95.9 | 97 | 97 | 98.0 | 43 | 96 | 95 |
| Ultrasound | 84 | 92 | 85 | 90 | 84 | 84 | 83 | 85.0 | 81 | 83 | 93 |
| Variation | 80 | — | 91 | 95 | 85 | 80 | 75 | 91.0 | 65 | 86 | 86 |
| Weight | 55 | 83 | 79 | 81 | 56.6 | 56.6 | 56.6 | 84.9 | 66 | 68 | 83 |
| White | 54 | 79 | 74 | 76 | 68.9 | 67.8 | 63.3 | 81.1 | 57 | 58 | 100 |
|
| |||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
A sample text from species disambiguation.
| Homosapiens (human) | Mus Musculus (mouse) |
|---|---|
| (No. 1) Significantly, Diva lacks critical residues in the conserved BH3 region that mediate the interaction between BH3-containing proapoptotic Bcl-2 homologues and their prosurvival binding partners. Consistent with this, Diva did not bind to cellular Bcl-2 family members including Bcl-2, Bcl-XL, Bcl-w, Mcl-1, and A1/Bfl-1 | (No. 2) The BCL-2 family has various pairs of antagonist and agonist proteins that regulate apoptosis. Whether their function is interdependent is uncertain. Using a genetic approach to address this question, we utilized gain- and loss-of-function models of Bcl-2 and Bax and found that apoptosis and thymic hypoplasia characteristic of Bcl-2-deficient mice are largely absent in mice also deficient in Bax |
The averaged evaluation results from Wang et al. [9].
| Micro-avg. | Macro-avg. | |||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | |
| RULE-MAJORITY | 72.2 | 62.39 | 66.94 | 27.77 | 46.67 | 29.32 |
| RULE-SP | 74.09 | 64.03 | 68.69 | 29.77 | 53.81 | 32.2 |
| RULE-SPSENT | 72.94 | 63.03 | 67.63 | 30.22 | 54.76 | 32.93 |
| C&C | 73.82 | 63.79 | 68.44 | 30.51 | 53.59 | 33.43 |
| ENJU | 72.98 | 63.06 | 67.66 | 31.35 | 55 | 34.61 |
| ENJU-Genia | 73 | 63.08 | 67.68 | 30.11 | 53.42 | 32.97 |
| Minipar | 73.02 | 63.1 | 67.69 | 30.19 | 53.56 | 33.1 |
| Stanford | 73.67 | 63.66 | 68.3 | 31.17 | 56.35 | 34.35 |
| Stanford-Genia | 73.48 | 63.5 | 68.13 | 30.61 | 55.61 | 33.78 |
| ML | 82.69 | 82.69 | 82.69 | 27.01 | 27.84 | 27.37 |
| RELATION | 75.24 | 63.99 | 69.16 | 31.97 | 55.61 | 34.8 |
| HYBRID | 83.8 | 83.8 | 83.8 | 57.56 | 49.72 | 49.9 |
Precision, recall, and F1 results of our method on the fivefold in the species disambiguation experiments.
| Micro-avg | |||
|---|---|---|---|
| Precision | Recall | F1 | |
| Fold 1 | 81.86 | 92.78 | 87.0 |
| Fold 2 | 82.08 | 94.77 | 88.0 |
| Fold 3 | 82.95 | 97.31 | 89.6 |
| Fold 4 | 84.12 | 98.70 | 90.8 |
| Fold 5 | 81.25 | 85.83 | 83.5 |
|
| |||
| Average | 82.45 | 93.88 | 87.8 |
|
| |||
|
| |||