| Literature DB >> 25708164 |
Qingyao Wu, Zhenyu Wang, Chunshan Li, Yunming Ye, Yueping Li, Ning Sun.
Abstract
BACKGROUND: Predicting functional properties of proteins in protein-protein interaction (PPI) networks presents a challenging problem and has important implication in computational biology. Collective classification (CC) that utilizes both attribute features and relational information to jointly classify related proteins in PPI networks has been shown to be a powerful computational method for this problem setting. Enabling CC usually increases accuracy when given a fully-labeled PPI network with a large amount of labeled data. However, such labels can be difficult to obtain in many real-world PPI networks in which there are usually only a limited number of labeled proteins and there are a large amount of unlabeled proteins. In this case, most of the unlabeled proteins may not connected to the labeled ones, the supervision knowledge cannot be obtained effectively from local network connections. As a consequence, learning a CC model in sparsely-labeled PPI networks can lead to poor performance.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25708164 PMCID: PMC4331684 DOI: 10.1186/1752-0509-9-S1-S9
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Label deficiency problem in constructing relational features . (a) The unknown protein is linked to 3 labeled protein where 2 are positive and 1 is negative, as such x=< 2; 1 >. (b) The labels of neighboring nodes are initially unknown, which makes the compute of relational features very challenging; (c) By adding latent linkages, the unknown protein is connect to the most relevant labeled nodes.
Accuracy (mean ± standard deviation) of the compared algorithms against different label ratios on problem (1) of KDD Cup 2001.
| label ratio | RNMF | SVM | wvRN+RL | ICA | semiICA |
|---|---|---|---|---|---|
| 2% | 0.700 ± 0.044 | 0.633 ± 0.012 | 0.700 ± 0.058 | 0.725 ± 0.052 | |
| 3% | 0.736 ± 0.004 | 0.624 ± 0.013 | 0.731 ± 0.063 | 0.755 ± 0.004 | |
| 4% | 0.774 ± 0.005 | 0.650 ± 0.004 | 0.760 ± 0.052 | 0.774 ± 0.055 | |
| 5% | 0.770 ± 0.003 | 0.675 ± 0.023 | 0.771 ± 0.058 | 0.792 ± 0.001 | |
| Avg. | 0.745 ± 0.014 | 0.645 ± 0.013 | 0.740 ± 0.057 | 0.762 ± 0.028 |
Figure 2ROC curves of RNMF and baselines (SVM and wvRN+RL).
Figure 3Convergence curve of RNMF for the problem (1) of KDD Cup 2001 dataset.
Figure 4Classification accuracy of RNMF with respect to different . (the parameter β is fixed at 5).
Figure 5Classification accuracy of RNMF with respect to different . (the parameter α is fixed at 10).
Selected interrelated genes and their similarity computed by the proposed method.
| GeneID | GeneID | Similarity |
|---|---|---|
| G238510 | G239467 | 0.9984 |
| G238510 | G239178 | 0.9987 |
| G238510 | G235250 | 0.9983 |
| G234935 | G234445 | 0.9094 |
| G234935 | G239966 | 0.9388 |
| G234935 | G235763 | 0.9589 |
| G234935 | G235329 | 0.9700 |
| G235158 | G234735 | 0.9776 |
| G235158 | G234074 | 0.9808 |
| G235158 | G234177 | 0.9837 |
| G235158 | G235216 | 0.9554 |
| G237021 | G234486 | 0.8831 |
| G237021 | G234065 | 0.9222 |
| G237021 | G239804 | 0.9285 |
| G237021 | G239266 | 0.8751 |
| G234980 | G235439 | 0.9865 |
| G234980 | G235231 | 0.9843 |
| G234980 | G234914 | 0.9939 |
| G234980 | G235780 | 0.9305 |
Figure 6Coverage of different algorithms with varying percentages of labeled data on problem (2) of KDD Cup 2001.
Figure 7RankingLoss of different algorithms with varying percentages of labeled data on problem (2) of KDD Cup 2001.