Literature DB >> 11471243

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption.

M Y Galperin1, E V Koonin.   

Abstract

Functional annotation of proteins encoded in newly sequenced genomes can be expected to meet two conflicting objectives: (i) provide as much information as possible, and (ii) avoid erroneous functional assignments and over-predictions. The continuing exponential growth of the number of sequenced genomes makes the quality of sequence annotation a critical factor in the efforts to utilize this new information. When dubious functional assignments are used as a basis for subsequent predictions, they tend to proliferate, leading to "database explosion". It is therefore important to identify the common factors that hamper functional annotation. As a first step towards that goal, we have compared the annotations of the Mycoplasma genitalium and Methanococcus jannaschii genomes produced in several independent studies. The most common causes of questionable predictions appear to be: i) non-critical use of annotations from existing database entries; ii) taking into account only the annotation of the best database hit; iii) insufficient masking of low complexity regions (e.g. non-globular domains) in protein sequences, resulting in spurious database hits obscuring relevant ones; iv) ignoring multi-domain organization of the query proteins and/or the database hits; v) non-critical functional inferences on the basis of the functions of neighboring genes in an operon; vi) non-orthologous gene displacement, i.e. involvement of structurally unrelated proteins in the same function. These observations suggest that case by case validation of functional annotation by expert biologists remains crucial for productive genome analysis.

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 11471243

Source DB:  PubMed          Journal:  In Silico Biol        ISSN: 1386-6338


  77 in total

1.  The Pfam protein families database.

Authors:  A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Mendel-GFDb and Mendel-ESTS: databases of plant gene families and ESTs annotated with gene family numbers and gene family names.

Authors:  D Lonsdale; M Crowe; B Arnold; B C Arnold
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  The PEDANT genome database.

Authors:  Dmitrij Frishman; Martin Mokrejs; Denis Kosykh; Gabi Kastenmüller; Grigory Kolesov; Igor Zubrzycki; Christian Gruber; Birgitta Geier; Andreas Kaps; Kaj Albermann; Andreas Volz; Christian Wagner; Matthias Fellenberg; Klaus Heumann; Hans-Werner Mewes
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Dictionary-driven protein annotation.

Authors:  Isidore Rigoutsos; Tien Huynh; Aris Floratos; Laxmi Parida; Daniel Platt
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

5.  LEON: multiple aLignment Evaluation Of Neighbours.

Authors:  Julie D Thompson; Véronique Prigent; Olivier Poch
Journal:  Nucleic Acids Res       Date:  2004-02-24       Impact factor: 16.971

6.  Phylogenetic molecular function annotation.

Authors:  Barbara E Engelhardt; Michael I Jordan; Susanna T Repo; Steven E Brenner
Journal:  J Phys Conf Ser       Date:  2009

7.  A novel method for multiple alignment of sequences with repeated and shuffled elements.

Authors:  Benjamin Raphael; Degui Zhi; Haixu Tang; Pavel Pevzner
Journal:  Genome Res       Date:  2004-11       Impact factor: 9.043

8.  Effective function annotation through catalytic residue conservation.

Authors:  Richard A George; Ruth V Spriggs; Gail J Bartlett; Alex Gutteridge; Malcolm W MacArthur; Craig T Porter; Bissan Al-Lazikani; Janet M Thornton; Mark B Swindells
Journal:  Proc Natl Acad Sci U S A       Date:  2005-07-21       Impact factor: 11.205

9.  Broad issues to consider for library involvement in bioinformatics.

Authors:  Renata C Geer
Journal:  J Med Libr Assoc       Date:  2006-07

10.  The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.

Authors:  Ishita K Khan; Qing Wei; Samuel Chapman; Dukka B Kc; Daisuke Kihara
Journal:  Gigascience       Date:  2015-09-14       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.