| Literature DB >> 22168401 |
Min Song1, Hwanjo Yu, Wook-Shin Han.
Abstract
BACKGROUND: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22168401 PMCID: PMC3247085 DOI: 10.1186/1471-2105-12-S12-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1System architecture of PPISpotter
Figure 2Combination of active learning with semi-supervised learning
Figure 3BTDA-SVM algorithm
Features extracted from example sentence A
| Feature | Feature Value |
|---|---|
| Is negated sentence | True |
| No. of protein occurrences | 3 |
| Interactor name | response |
| Interactor POS | NN |
| Interactor position | 88 |
| No. of words in between proteins | 24 |
| No. of left words | -1 |
| No. of right words | 12 |
| Link path status | Yes |
Data sets used for experiments
| Data Set | Total Sentences | Positive Sentences | Negative Sentences |
|---|---|---|---|
| 4026 | 951 | 3075 | |
| 4056 | 2202 | 1854 | |
| 1100 | 573 | 527 |
Experimental results – AIMED data set
| SVM | 55.15% | 42.47% | 48.14% |
| RS-SVM | 56.98% | 41.71% | 48.92% |
| C-SVM | 64.53% | 40.42% | 50.67% |
| BT-SVM | 65.23% | 42.51% | 53.64% |
| BTDA-SVM | 74.34% | 50.75% | 61.91% |
| Yakushiji et al. [ | 33.70% | 33.10% | 33.40% |
| Mitsumori et al. [ | 54.20% | 42.60% | 47.70% |
Figure 4The F-score on the AIMED dataset with varying sizes of training data
Figure 5The F-score on the BioCreative II PPI dataset with varying sizes of training data
Experimental results – BioCreative2 PPI data set
| SVM | 70.23% | 51.21% | 58.33% |
| RS-SVM | 71.7% | 56.54% | 62.5% |
| C-SVM | 78.23% | 88.68% | 83.65% |
| BT-SVM | 81.75% | 93.5% | 85.96% |
| BTDA-SVM | 85.92% | 95.32% | 86.85% |
| TSVM-edit [ | 85.62% | 84.89% | 85.22% |
Comparison results – BioInfer data set
| SVM | 65.89% | 54.6% | 0.843 |
| RS-SVM | 64.5% | 55.2% | 0.847 |
| C-SVM | 70.24% | 60.2% | 0.86 |
| BT-SVM | 79.29% | 63.1% | 0.918 |
| BTDA-SVM | 82.52% | 65.2% | 0.93 |
| Graph Kernel [ | 47.7% | 59.9% | 0.849 |