| Literature DB >> 15491499 |
Nan Lin1, Baolin Wu, Ronald Jansen, Mark Gerstein, Hongyu Zhao.
Abstract
BACKGROUND: Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15491499 PMCID: PMC529436 DOI: 10.1186/1471-2105-5-154
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Order of variables that enter the final model by stepwise selection in logistic regression
| Variables | Order |
| Gavin | 1 |
| MIPS | 2 |
| Rosetta | 3 |
| GO | 4 |
| cellcycle | 5 |
| essentiality | 6 |
| Rosetta*cellcycle | 7 |
| cellcycle*essentiality | 8 |
| Ho | 9 |
| GO*essentiality | 10 |
| Uetz | 11 |
| GO*cellcycle | 12 |
| GO*cellcycle*essentiality | 13 |
| MIPS*essentiality | 14 |
| MIPS*Rosetta | 15 |
Deviance of the reduced model from the final model by removing corresponding variables
| Variable | Deviance |
| GO | 1376.437 |
| MIPS | 1333.97 |
| essentiality | 579.988 |
| Rosetta | 778.493 |
| cellcycle | 1271.461 |
| Ho | 68.718 |
| Uetz | 20.513 |
| Gavin | 1839.181 |
Figure 1Importance measure of genomic features from the random forest algorithm The horizontal axis presents the importance measure whereas the vertical axis denotes the genomic features.
Figure 2ROC curves of random forest, logistic regression and Bayesian networks using 7-fold cross validations
Figure 3Histograms of MIPS and Gene Ontology function data for gold standard positives and negatives
Figure 4Zoom-in histograms of MIPS and Gene Ontology function data for gold standard positives and negatives on the lower end
Optimal classification errors when using different genomic features
| Variables | Optimal Classification Error |
| MIPS | 1.69% |
| GO | 2.15% |
| MIPS+GO | 0.28% |
| MIPS (grouped) | 7.31% |
| GO (grouped) | 13.35% |
| MIPS+GO (grouped) | 6.34% |
Classification errors of the random forest algorithm when using different genomic features
| Variables | |||
| MIPS+GO | 114/2104 = 5.42% | 180/172409 = 0.1% | 2.76% |
| ALL | 165/2104 = 7.80% | 89/172409 = 0.05% | 3.95% |
| ELSE | 1056/2104 = 78.09% | 313/172409 = 0.20% | 25.20% |
Figure 5ROC curves of random forest using different genomic feature sets 'All' – all genomic information; 'MIPS+GO' – only MIPS and Gene Ontology function data; 'ELSE' – genomic features other than MIPS and Gene Ontology function data