Literature DB >> 12490449

Modeling the percolation of annotation errors in a database of protein sequences.

Walter R Gilks1, Benjamin Audit, Daniela De Angelis, Sophia Tsoka, Christos A Ouzounis.   

Abstract

Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12490449     DOI: 10.1093/bioinformatics/18.12.1641

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  66 in total

1.  Pathway analysis software: annotation errors and solutions.

Authors:  Nicole K Henderson-Maclennan; Jeanette C Papp; C Conover Talbot; Edward R B McCabe; Angela P Presson
Journal:  Mol Genet Metab       Date:  2010-06-22       Impact factor: 4.797

2.  A categorization approach to automated ontological function annotation.

Authors:  Karin Verspoor; Judith Cohn; Susan Mniszewski; Cliff Joslyn
Journal:  Protein Sci       Date:  2006-05-02       Impact factor: 6.725

Review 3.  A bioinformatician's guide to metagenomics.

Authors:  Victor Kunin; Alex Copeland; Alla Lapidus; Konstantinos Mavromatis; Philip Hugenholtz
Journal:  Microbiol Mol Biol Rev       Date:  2008-12       Impact factor: 11.056

4.  Righting the wrongs.

Authors:  Caroline Hadley
Journal:  EMBO Rep       Date:  2003-09       Impact factor: 8.807

5.  MitoP2, an integrated database on mitochondrial proteins in yeast and man.

Authors:  C Andreoli; H Prokisch; K Hörtnagel; J C Mueller; M Münsterkötter; C Scharfe; T Meitinger
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  Implications of physiological studies based on genomic sequences: Streptococcus pneumoniae TIGR4 synthesizes a functional LytC lysozyme.

Authors:  Miriam Moscoso; Elena López; Ernesto García; Rubens López
Journal:  J Bacteriol       Date:  2005-09       Impact factor: 3.490

7.  ANNIE: integrated de novo protein sequence annotation.

Authors:  Hong Sain Ooi; Chia Yee Kwo; Michael Wildpaner; Fernanda L Sirota; Birgit Eisenhaber; Sebastian Maurer-Stroh; Wing Cheong Wong; Alexander Schleiffer; Frank Eisenhaber; Georg Schneider
Journal:  Nucleic Acids Res       Date:  2009-04-23       Impact factor: 16.971

8.  A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions.

Authors:  Brenton Louie; Roger Higdon; Eugene Kolker
Journal:  PLoS One       Date:  2009-10-21       Impact factor: 3.240

9.  Enhancing navigation in biomedical databases by community voting and database-driven text classification.

Authors:  Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder
Journal:  BMC Bioinformatics       Date:  2009-10-03       Impact factor: 3.169

10.  Berkeley PHOG: PhyloFacts orthology group prediction web server.

Authors:  Ruchira S Datta; Christopher Meacham; Bushra Samad; Christoph Neyer; Kimmen Sjölander
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.