| Literature DB >> 20823334 |
Yanjun Qi1, Oznur Tastan, Jaime G Carbonell, Judith Klein-Seetharaman, Jason Weston.
Abstract
MOTIVATION: Protein-protein interactions (PPIs) are critical for virtually every biological function. Recently, researchers suggested to use supervised learning for the task of classifying pairs of proteins as interacting or not. However, its performance is largely restricted by the availability of truly interacting proteins (labeled). Meanwhile, there exists a considerable amount of protein pairs where an association appears between two partners, but not enough experimental evidence to support it as a direct interaction (partially labeled).Entities:
Mesh:
Substances:
Year: 2010 PMID: 20823334 PMCID: PMC2935441 DOI: 10.1093/bioinformatics/btq394
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Target problem: predicting protein interactions between HIV-1 (organge squre) and human (gray circle). There exist weakly labeled interaction pairs from NIAID (dashed blue edges) and labeled interaction pairs from experts' annotation (solid blue edges). We aim to predict whether a given unknown human to HIV-1 protein pair (dashed green) interacts or not.
Basic statistics of feature and ‘gold standard’ set
| Features | Positive PPIs | Partial | Remaining | HIV-1 | Human |
|---|---|---|---|---|---|
| (experts) | positive | pairs | protein | protein | |
| 18 | 352 338 | 17 | 20 873 |
*This also excludes 226 pair experts labeled as ‘unsure’. Bold values means related to PPI.
Fig. 2.To perform multi-task learning with the supervised PPI classification, three semi-supervised tasks have been proposed to extend the network structure of multi-layer perceptron: (a) training another classifier to distinguish partial positive and negative examples; (b) training a ranker to sort partial positive and negative data; (c) training an embedding on the output of the supervised classifier.
Fig. 3.Two ways to train baseline classifiers for performance comparison. (a) train with positive + negative; (b) train with positive + partial positive (treat as positive) + negative.
Performance comparison (with multiple metric scores)
| Method | R50 | MAP | PRB | AUC |
|---|---|---|---|---|
| SMLC | 0.277 | 0.263 | 0.312 | 0.905 |
| SMLR | 0.268 | 0.311 | ||
| SMLE | 0.309 | 0.908 | ||
| RF | 0.199 | 0.135 | 0.180 | 0.893 |
| RF-P | 0.230 | 0.213 | 0.281 | 0.896 |
| MLP | 0.204 | 0.197 | 0.257 | 0.859 |
| MLP-P | 0.229 | 0.210 | 0.282 | 0.893 |
SMLC, SML with classification task; SMLR, SML with ranking task; SMLE, SML with embedding on output; RF, Random Forest; MLP, Multi-Layer Perceptron Net. RF-P, RF adding partial positive; MLP-P, MLP adding partial positive. Bold values gives the best performance in the column.
Statistics of overlaps between top predicted human partners to those found in (i) (Brass et al., 2008) siRNA screen list, (ii) (Ott, 2008) virion screen list, (iii) combined four siRNA screens (Brass et al., 2008; König et al., 2008; Yeung et al., 2009; Zhou et al., 2008)
| Score | Num Predicted | Confirmed | Novel | No. human protein | Overlap | overlap | overlap |
|---|---|---|---|---|---|---|---|
| cutoff | interactions | by NIAID | Interactions | in predInteractions | siRNA | virion | CombineFourSiRNA |
| −1.8 | 3428 | 259 | 3123 | 1027 | 24 | 72 | 96 |
| −1.5 | 2434 | 223 | 2172 | 721 | 21 | 61 | 72 |