| Literature DB >> 24564855 |
Qingyao Wu, Yunming Ye, Michael K Ng, Shen-Shyang Ho, Ruichao Shi.
Abstract
BACKGROUND: Automated assignment of functions to unknown proteins is one of the most important task in computational biology. The development of experimental methods for genome scale analysis of molecular interaction networks offers new ways to infer protein function from protein-protein interaction (PPI) network data. Existing techniques for collective classification (CC) usually increase accuracy for network data, wherein instances are interlinked with each other, using a large amount of labeled data for training. However, the labeled data are time-consuming and expensive to obtain. On the other hand, one can easily obtain large amount of unlabeled data. Thus, more sophisticated methods are needed to exploit the unlabeled data to increase prediction accuracy for protein function prediction.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24564855 PMCID: PMC4015526 DOI: 10.1186/1471-2105-15-S2-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An example of ICA algorithm learning with limited labeled data. (a) initial state, train classifier Mand classify V; (b) Compute relational features X, train classifier M; (c) re-predict V(use M); (d) re-compute relational features X. ICA repeats step c and step d until a fixed iteration number.
The performance (mean ± standard deviation) of compared algorithms on the Yeast protein dataset.
| Methods | Coverage | Ranking Loss | One-error | 1-Average Precision |
|---|---|---|---|---|
| ICA | 4.217 ± 0.273 | 0.140 ± 0.013 | 0.042 ± 0.005 | 0.155 ± 0.005 |
| Gibbs | 4.319 ± 0.195 | 0.148 ± 0.008 | 0.043 ± 0.005 | 0.154 ± 0.006 |
| ICML | 4.409 ± 0.091 | 0.153 ± 0.006 | 0.043 ± 0.007 | 0.162 ± 0.006 |
| ICAM |
Figure 2The performance of different algorithms on the Yeast protein dataset with varying number of labeled instances.
Figure 3ROC curve of baseline SVM and our ICAM method.
The description of experimental datasets used in the experiments on collaboration networks.
| Datasets | Number of Instances | Number of Attributes | Number of Links | Number of Classes |
|---|---|---|---|---|
| DBLP-A | 23,806 | 12,588 | 150,042 | 6 |
| DBLP-B | 16,020 | 8,595 | 95,108 | 6 |
Figure 4The coverage performance of different algorithms with varying number of labeled instances: (a) DBLP-A dataset; (b) DBLP-B dataset.