| Literature DB >> 25036529 |
Changqin Quan1, Meng Wang1, Fuji Ren2.
Abstract
The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein-protein interactions extraction, and (2) Gene-suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.Entities:
Mesh:
Year: 2014 PMID: 25036529 PMCID: PMC4103846 DOI: 10.1371/journal.pone.0102039
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Different dependencies between and .
Figure 2The dependency tree of the sentence ‘Recombinant neuregulin-2beta induces the tyrosine phosphorylation of ErbB2, ErbB3 and ErbB4 in cell line express all of these erbb family receptor.’, where words are assigned with word positions (number appended to words), dependency types (italic) appended to edges. Words marked in bold indicate gene/protein names.
Words in rectangle indicate interaction words.
The extracted paths and relations from the dependency tree (Fig. 2) by Rule RD1.
| No. | Path | Relation |
| 1 | neuregulin-2beta— | (neuregulin-2beta, ErbB2) |
| 2 | neuregulin-2beta— | (neuregulin-2beta, ErbB3) |
| 3 | neuregulin-2beta— | (neuregulin-2beta, ErbB4) |
| 4 | ErbB3—ErbB2—of —phosphorylation— | (ErbB3, ErbB2) |
| 5 | ErbB4—ErbB2—of —phosphorylation— | (ErbB4, ErbB2) |
| 6 | ErbB3—ErbB2—of —phosphorylation— | (ErbB3, ErbB4) |
*Words marked in bold indicate the focused entities. Italic words indicate interaction words.
Figure 3NP+VP phrase structure between and .
Results comparison of protein-protein interactions extraction by different methods.
| Method | Descriptions | P | R | F |
| Yakushiji et al., 2005 | Rules based; phrase structure parse; supervised | 33.70 | 33.10 | 33.40 |
| Mitsumori et al., 2006 | SVM based; supervised | 54.20 | 42.60 | 47.70 |
| Bunescu et al., 2006 | Kernel based; svm; supervised | 65.00 | 46.40 | 54.20 |
| Giuliano et al., 2006 | Combination of kernels; global and local context; supervised | 60.90 | 57.20 | 59.00 |
| Miwa et al., 2009 | Combination of kernels; syntactic parseing; svm; supervised | 58.70 | 66.10 | 61.90 |
| Erkan et al., 2007 | Dependency parse; edit distance similarity; transductive SVM; semi-supervised | 59.59 | 60.68 | 59.96 |
| Our proposed I (Unsupervised) | Clustering based, dependency and phrase structure parse; unsupervised | 44.80 | 71.40 | 55.10 |
| Our proposed II (Semi-supervised) | Semi-supervised KNN, edit distance similarity, clustering based, dependency and phrase structure parse; semi-supervised | 56.60 | 66.80 | 60.70 |
Results comparison with different linguistic parsing and rules.
| Suicide related gene list | Method | Dataset I | Datase II | ||||
| P | R | F | P | R | F | ||
| GAD gene list | Co-occurrence | 54.40 | 100.00 | 70.50 | 14.90 | 95.10 | 25.70 |
| Our proposed (Unsupervised) | 71.10 | 86.50 | 78.00 | 25.60 | 56.10 | 35.10 | |
| Our proposed (Semi-supervised) | 75.35 | 90.06 | 82.05 | 29.55 | 67.12 | 41.03 | |
| GeneCards | Co-occurrence | 54.29 | 96.50 | 69.49 | 24.67 | 82.03 | 37.93 |
| Our proposed (Unsupervised) | 63.46 | 91.67 | 75.00 | 28.37 | 76.62 | 42.18 | |
| Our proposed (Semi-supervised) | 67.80 | 94.66 | 79.01 | 33.28 | 81.12 | 47.20 | |
The extracted interaction words of protein–protein interaction from Aimed.
| Interaction words (from Aimed corpus) |
| Bind, induce, activate, associate, mediate, block, interact, contain, phosphorylate |
Figure 4The F-score curves on the AImed corpus with varying sizes of training data with different K values.
Figure 5The F-score curves by using semi-supervised KNN algorithm and the proposed semi-supervised approach on the AIMed dataset with varying sizes of training data.
Results comparison with different linguistic parsing and rules.
| Parsing type | Linguistic rules | P | R | F |
| Dependency parsing | RD1 | 34.28 | 75.70 | 47.19 |
| RD1+RD2 | 38.32 | 72.74 | 50.20 | |
| Phrase structure parsing | RP1+RP2 | 49.90 | 39.57 | 44.14 |
| Dependency parsing + phrase structure parsing | RD1+RD2+RP1+ RP2 | 44.80 | 71.40 | 55.10 |
The extracted interaction verbs of gene-suicide relation from Dataset I and Dataset II.
| Interaction words | |
| Dataset I | Associate, complete, control, increase, consider, homozygote, attempt, suggest, bear, repeat, carry, compare, classify, play, modify, find, indicate, influence, depress, affect, implicate, act, monitor |
| Dataset II | Attempt, associate, commit, program, increase, compare, complete, message, consider, influence, know, show, use, involve, link, carry, measure, obtain, confer, play, homozygote, implicate, assess, find, function, express, result, combine, contain, act, distinguish, decrease, admit, adopt, report, depress, take, identify, occur, load, risk, indicate, include, reduce, construct, live, confirm, announce, investigate, control, set, prevent, cingulate, liberate, suggest, monitor, hospitalize |