| Literature DB >> 26539460 |
Jianzhuang Yao1, Hong Guo1, Xiaohan Yang2.
Abstract
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.Entities:
Year: 2015 PMID: 26539460 PMCID: PMC4619929 DOI: 10.1155/2015/608042
Source DB: PubMed Journal: Int J Genomics ISSN: 2314-436X Impact factor: 2.326
Figure 1The PPCM pipeline for protein-protein interaction prediction. Given a pair of query proteins QA and QB, their interaction possibility was first predicted by each of the 194 classifiers from GO2PPI and Phyloprof. Then, the classification scores were merged using Random Forests algorithm to generate the final PPI prediction score. Nine PPI classification scores were provided by PPCM. “SC” represents PPI networks in Saccharomyces cerevisiae. “Cross” represents all PPI networks except SC. “All” represents all PPI networks in both SC and cross species.
Figure 2Comparison of PPI prediction accuracy in the GO2PPI category. (a) PPI prediction based on classifiers related to SC. (b) PPI prediction based on classifiers related to cross species. (c) PPI prediction based on classifiers related to all species. “Average” represents the mean AUC of all the classifiers in each category. “Highest” represents the classifier with highest AUC among all the classifiers in each category. Error bars show standard deviation. “∗” indicates that AUC of PPCM was significantly (P value < 0.05; t-test) higher than that of the most accurate classifier in each category.
Figure 3Comparison of PPI prediction accuracy in the Phyloprof category. (a) PPI prediction based on classifiers related to SC. (b) PPI prediction based on classifiers related to cross species. (c) PPI prediction based on classifiers related to all species. “Average” represents the mean AUC of all the classifiers in each category. “Highest” represents the classifier with highest AUC among all the classifiers in each category. Error bars show standard deviation. “∗” indicates that AUC of PPCM was significantly (P value < 0.05; t-test) higher than that of the most accurate classifier in each category.
Figure 4Comparison of PPI prediction accuracy in the GO2PPI + Phyloprof category. Error bars show standard deviation.