Literature DB >> 19561015

The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction.

Curtis Huttenhower1, Matthew A Hibbs, Chad L Myers, Amy A Caudy, David C Hess, Olga G Troyanskaya.   

Abstract

MOTIVATION: Rapidly expanding repositories of highly informative genomic data have generated increasing interest in methods for protein function prediction and inference of biological networks. The successful application of supervised machine learning to these tasks requires a gold standard for protein function: a trusted set of correct examples, which can be used to assess performance through cross-validation or other statistical approaches. Since gene annotation is incomplete for even the best studied model organisms, the biological reliability of such evaluations may be called into question.
RESULTS: We address this concern by constructing and analyzing an experimentally based gold standard through comprehensive validation of protein function predictions for mitochondrion biogenesis in Saccharomyces cerevisiae. Specifically, we determine that (i) current machine learning approaches are able to generalize and predict novel biology from an incomplete gold standard and (ii) incomplete functional annotations adversely affect the evaluation of machine learning performance. While computational approaches performed better than predicted in the face of incomplete data, relative comparison of competing approaches-even those employing the same training data-is problematic with a sparse gold standard. Incomplete knowledge causes individual methods' performances to be differentially underestimated, resulting in misleading performance evaluations. We provide a benchmark gold standard for yeast mitochondria to complement current databases and an analysis of our experimental results in the hopes of mitigating these effects in future comparative evaluations. AVAILABILITY: The mitochondrial benchmark gold standard, as well as experimental results and additional data, is available at http://function.princeton.edu/mitochondria.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19561015      PMCID: PMC2735660          DOI: 10.1093/bioinformatics/btp397

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  A Bayesian networks approach for predicting protein-protein interactions from genomic data.

Authors:  Ronald Jansen; Haiyuan Yu; Dov Greenbaum; Yuval Kluger; Nevan J Krogan; Sambath Chung; Andrew Emili; Michael Snyder; Jack F Greenblatt; Mark Gerstein
Journal:  Science       Date:  2003-10-17       Impact factor: 47.728

3.  Tetrazolium overlay technique for population studies of respiration deficiency in yeast.

Authors:  M OGUR; R ST. JOHN; S NAGAI
Journal:  Science       Date:  1957-05-10       Impact factor: 47.728

4.  A statistical framework for genomic data fusion.

Authors:  Gert R G Lanckriet; Tijl De Bie; Nello Cristianini; Michael I Jordan; William Stafford Noble
Journal:  Bioinformatics       Date:  2004-05-06       Impact factor: 6.937

5.  Whole-genome annotation by using evidence integration in functional-linkage networks.

Authors:  Ulas Karaoz; T M Murali; Stan Letovsky; Yu Zheng; Chunming Ding; Charles R Cantor; Simon Kasif
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-23       Impact factor: 11.205

6.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes.

Authors:  Andreas Ruepp; Alfred Zollner; Dieter Maier; Kaj Albermann; Jean Hani; Martin Mokrejs; Igor Tetko; Ulrich Güldener; Gertrud Mannhaupt; Martin Münsterkötter; H Werner Mewes
Journal:  Nucleic Acids Res       Date:  2004-10-14       Impact factor: 16.971

7.  Causal protein-signaling networks derived from multiparameter single-cell data.

Authors:  Karen Sachs; Omar Perez; Dana Pe'er; Douglas A Lauffenburger; Garry P Nolan
Journal:  Science       Date:  2005-04-22       Impact factor: 47.728

8.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps.

Authors:  Elena Nabieva; Kam Jim; Amit Agarwal; Bernard Chazelle; Mona Singh
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

9.  A probabilistic functional network of yeast genes.

Authors:  Insuk Lee; Shailesh V Date; Alex T Adai; Edward M Marcotte
Journal:  Science       Date:  2004-11-26       Impact factor: 47.728

10.  Computationally driven, quantitative experiments discover genes required for mitochondrial biogenesis.

Authors:  David C Hess; Chad L Myers; Curtis Huttenhower; Matthew A Hibbs; Alicia P Hayes; Jadine Paw; John J Clore; Rosa M Mendoza; Bryan San Luis; Corey Nislow; Guri Giaever; Michael Costanzo; Olga G Troyanskaya; Amy A Caudy
Journal:  PLoS Genet       Date:  2009-03-20       Impact factor: 5.917

View more
  19 in total

Review 1.  Computational tools for prioritizing candidate genes: boosting disease gene discovery.

Authors:  Yves Moreau; Léon-Charles Tranchevent
Journal:  Nat Rev Genet       Date:  2012-07-03       Impact factor: 53.242

2.  Systems biology of the autophagy-lysosomal pathway.

Authors:  Anil G Jegga; Lonnie Schneider; Xiaosen Ouyang; Jianhua Zhang
Journal:  Autophagy       Date:  2011-05-01       Impact factor: 16.016

3.  Parametric Bayesian priors and better choice of negative examples improve protein function prediction.

Authors:  Noah Youngs; Duncan Penfold-Brown; Kevin Drew; Dennis Shasha; Richard Bonneau
Journal:  Bioinformatics       Date:  2013-03-19       Impact factor: 6.937

4.  Accurate evaluation and analysis of functional genomics data and methods.

Authors:  Casey S Greene; Olga G Troyanskaya
Journal:  Ann N Y Acad Sci       Date:  2012-01-23       Impact factor: 5.691

Review 5.  Yeast: an experimental organism for 21st Century biology.

Authors:  David Botstein; Gerald R Fink
Journal:  Genetics       Date:  2011-11       Impact factor: 4.562

6.  A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks.

Authors:  Varun Narendra; Nikita I Lytkin; Constantin F Aliferis; Alexander Statnikov
Journal:  Genomics       Date:  2010-10-14       Impact factor: 5.736

7.  A quick guide to large-scale genomic data mining.

Authors:  Curtis Huttenhower; Oliver Hofmann
Journal:  PLoS Comput Biol       Date:  2010-05-27       Impact factor: 4.475

Review 8.  Issues in bioinformatics benchmarking: the case study of multiple sequence alignment.

Authors:  Mohamed Radhouene Aniba; Olivier Poch; Julie D Thompson
Journal:  Nucleic Acids Res       Date:  2010-07-17       Impact factor: 16.971

9.  Novel insights into embryonic stem cell self-renewal revealed through comparative human and mouse systems biology networks.

Authors:  Karen G Dowell; Allen K Simons; Hao Bai; Braden Kell; Zack Z Wang; Kyuson Yun; Matthew A Hibbs
Journal:  Stem Cells       Date:  2014-05       Impact factor: 6.277

10.  Mining GO annotations for improving annotation consistency.

Authors:  Daniel Faria; Andreas Schlicker; Catia Pesquita; Hugo Bastos; António E N Ferreira; Mario Albrecht; André O Falcão
Journal:  PLoS One       Date:  2012-07-25       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.