Literature DB >> 15998909

Assessing the limits of genomic data integration for predicting protein networks.

Long J Lu1, Yu Xia, Alberto Paccanaro, Haiyuan Yu, Mark Gerstein.   

Abstract

Genomic data integration--the process of statistically combining diverse sources of information from functional genomics experiments to make large-scale predictions--is becoming increasingly prevalent. One might expect that this process should become progressively more powerful with the integration of more evidence. Here, we explore the limits of genomic data integration, assessing the degree to which predictive power increases with the addition of more features. We focus on a predictive context that has been extensively investigated and benchmarked in the past-the prediction of protein-protein interactions in yeast. We start by using a simple Naive Bayes classifier for integrating diverse sources of genomic evidence, ranging from coexpression relationships to similar phylogenetic profiles. We expand the number of features considered for prediction to 16, significantly more than previous studies. Overall, we observe a small, but measurable improvement in prediction performance over previous benchmarks, based on four strong features. This allows us to identify new yeast interactions with high confidence. It also allows us to quantitatively assess the inter-relations amongst different genomic features. It is known that subtle correlations and dependencies between features can confound the strength of interaction predictions. We investigate this issue in detail through calculating mutual information. To our surprise, we find no appreciable statistical dependence between the many possible pairs of features. We further explore feature dependencies by comparing the performance of our simple Naive Bayes classifier with a boosted version of the same classifier, which is fairly resistant to feature dependence. We find that boosting does not improve performance, indicating that, at least for prediction purposes, our genomic features are essentially independent. In summary, by integrating a few (i.e., four) good features, we approach the maximal predictive power of current genomic data integration; moreover, this limitation does not reflect (potentially removable) inter-relationships between the features.

Entities:  

Mesh:

Year:  2005        PMID: 15998909      PMCID: PMC1172038          DOI: 10.1101/gr.3610305

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  54 in total

1.  Relating whole-genome expression data with protein-protein interactions.

Authors:  Ronald Jansen; Dov Greenbaum; Mark Gerstein
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

2.  Proteomics. Integrating interactomes.

Authors:  Mark Gerstein; Ning Lan; Ronald Jansen
Journal:  Science       Date:  2002-01-11       Impact factor: 47.728

3.  Protein structure prediction and structural genomics.

Authors:  D Baker; A Sali
Journal:  Science       Date:  2001-10-05       Impact factor: 47.728

4.  Missing value estimation methods for DNA microarrays.

Authors:  O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman
Journal:  Bioinformatics       Date:  2001-06       Impact factor: 6.937

5.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.

Authors:  T Ideker; V Thorsson; J A Ranish; R Christmas; J Buhler; J K Eng; R Bumgarner; D R Goodlett; R Aebersold; L Hood
Journal:  Science       Date:  2001-05-04       Impact factor: 47.728

6.  MIPS: a database for genomes and protein sequences.

Authors:  H W Mewes; D Frishman; U Güldener; G Mannhaupt; K Mayer; M Mokrejs; B Morgenstern; M Münsterkötter; S Rudd; B Weil
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

7.  Systematic genetic analysis with ordered arrays of yeast deletion mutants.

Authors:  A H Tong; M Evangelista; A B Parsons; H Xu; G D Bader; N Pagé; M Robinson; S Raghibizadeh; C W Hogue; H Bussey; B Andrews; M Tyers; C Boone
Journal:  Science       Date:  2001-12-14       Impact factor: 47.728

8.  Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae.

Authors:  H Ge; Z Liu; G M Church; M Vidal
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

9.  A probabilistic functional network of yeast genes.

Authors:  Insuk Lee; Shailesh V Date; Alex T Adai; Edward M Marcotte
Journal:  Science       Date:  2004-11-26       Impact factor: 47.728

10.  A comprehensive two-hybrid analysis to explore the yeast protein interactome.

Authors:  T Ito; T Chiba; R Ozawa; M Yoshida; M Hattori; Y Sakaki
Journal:  Proc Natl Acad Sci U S A       Date:  2001-03-13       Impact factor: 11.205

View more
  68 in total

Review 1.  Proteome-wide prediction of protein-protein interactions from high-throughput data.

Authors:  Zhi-Ping Liu; Luonan Chen
Journal:  Protein Cell       Date:  2012-06-22       Impact factor: 14.870

Review 2.  Advantages and limitations of current network inference methods.

Authors:  Riet De Smet; Kathleen Marchal
Journal:  Nat Rev Microbiol       Date:  2010-08-31       Impact factor: 60.633

3.  Graemlin: general and robust alignment of multiple large interaction networks.

Authors:  Jason Flannick; Antal Novak; Balaji S Srinivasan; Harley H McAdams; Serafim Batzoglou
Journal:  Genome Res       Date:  2006-08-09       Impact factor: 9.043

4.  Uncovering the rules for protein-protein interactions from yeast genomic data.

Authors:  Jin Wang; Chunhe Li; Erkang Wang; Xidi Wang
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-23       Impact factor: 11.205

Review 5.  Proteomic approaches and identification of novel therapeutic targets for alcoholism.

Authors:  Giorgio Gorini; R Adron Harris; R Dayne Mayfield
Journal:  Neuropsychopharmacology       Date:  2013-07-31       Impact factor: 7.853

Review 6.  Protein interaction predictions from diverse sources.

Authors:  Yin Liu; Inyoung Kim; Hongyu Zhao
Journal:  Drug Discov Today       Date:  2008-03-06       Impact factor: 7.851

7.  Accounting for redundancy when integrating gene interaction databases.

Authors:  Antigoni Elefsinioti; Marit Ackermann; Andreas Beyer
Journal:  PLoS One       Date:  2009-10-22       Impact factor: 3.240

8.  Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data.

Authors:  Yong Wang; Xiang-Sun Zhang; Yu Xia
Journal:  Nucleic Acids Res       Date:  2009-08-06       Impact factor: 16.971

9.  Cost-effective strategies for completing the interactome.

Authors:  Ariel S Schwartz; Jingkai Yu; Kyle R Gardenour; Russell L Finley; Trey Ideker
Journal:  Nat Methods       Date:  2008-12-14       Impact factor: 28.547

10.  Integrated assessment of genomic correlates of protein evolutionary rate.

Authors:  Yu Xia; Eric A Franzosa; Mark B Gerstein
Journal:  PLoS Comput Biol       Date:  2009-06-12       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.