Literature DB >> 11591640

Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

H Hegyi1, M Gerstein.   

Abstract

Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.

Mesh:

Year:  2001        PMID: 11591640      PMCID: PMC311165          DOI: 10.1101/gr.183801

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  28 in total

1.  The ENZYME database in 2000.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

Authors:  C A Wilson; J Kreychman; M Gerstein
Journal:  J Mol Biol       Date:  2000-03-17       Impact factor: 5.469

3.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels.

Authors:  J Lin; M Gerstein
Journal:  Genome Res       Date:  2000-06       Impact factor: 9.043

Review 4.  Biological function made crystal clear - annotation of hypothetical proteins via structural genomics.

Authors:  E Eisenstein; G L Gilliland; O Herzberg; J Moult; J Orban; R J Poljak; L Banerjei; D Richardson; A J Howard
Journal:  Curr Opin Biotechnol       Date:  2000-02       Impact factor: 9.740

Review 5.  From structure to function: approaches and limitations.

Authors:  J M Thornton; A E Todd; D Milburn; N Borkakoti; C A Orengo
Journal:  Nat Struct Biol       Date:  2000-11

6.  Practical limits of function prediction.

Authors:  D Devos; A Valencia
Journal:  Proteins       Date:  2000-10-01

7.  Sensitive sequence comparison as protein function predictor.

Authors:  K Pawłowski; L Jaroszewski; L Rychlewski; A Godzik
Journal:  Pac Symp Biocomput       Date:  2000

8.  A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome.

Authors:  A Drawid; M Gerstein
Journal:  J Mol Biol       Date:  2000-08-25       Impact factor: 5.469

9.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.

Authors:  P M Harrison; N Echols; M B Gerstein
Journal:  Nucleic Acids Res       Date:  2001-02-01       Impact factor: 16.971

10.  Predicting protein function from structure: unique structural features of proteases.

Authors:  E W Stawiski; A E Baucom; S C Lohr; L M Gregoret
Journal:  Proc Natl Acad Sci U S A       Date:  2000-04-11       Impact factor: 11.205

View more
  51 in total

1.  Predicting gene ontology functions from ProDom and CDD protein domains.

Authors:  Jonathan Schug; Sharon Diskin; Joan Mazzarelli; Brian P Brunk; Christian J Stoeckert
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Structural characterization of the human proteome.

Authors:  Arne Müller; Robert M MacCallum; Michael J E Sternberg
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

3.  GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing.

Authors:  J Lin; J Qian; D Greenbaum; P Bertone; R Das; N Echols; A Senes; B Stenger; M Gerstein
Journal:  Nucleic Acids Res       Date:  2002-10-15       Impact factor: 16.971

4.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination.

Authors:  Gordana Apic; Wolfgang Huber; Sarah A Teichmann
Journal:  J Struct Funct Genomics       Date:  2003

5.  The SUPERFAMILY database in 2004: additions and improvements.

Authors:  Martin Madera; Christine Vogel; Sarah K Kummerfeld; Cyrus Chothia; Julian Gough
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  LEON: multiple aLignment Evaluation Of Neighbours.

Authors:  Julie D Thompson; Véronique Prigent; Olivier Poch
Journal:  Nucleic Acids Res       Date:  2004-02-24       Impact factor: 16.971

7.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

8.  Protein structure and evolutionary history determine sequence space topology.

Authors:  Boris E Shakhnovich; Eric Deeds; Charles Delisi; Eugene Shakhnovich
Journal:  Genome Res       Date:  2005-03       Impact factor: 9.043

9.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes.

Authors:  Andreas Ruepp; Alfred Zollner; Dieter Maier; Kaj Albermann; Jean Hani; Martin Mokrejs; Igor Tetko; Ulrich Güldener; Gertrud Mannhaupt; Martin Münsterkötter; H Werner Mewes
Journal:  Nucleic Acids Res       Date:  2004-10-14       Impact factor: 16.971

10.  Detecting remotely related proteins by their interactions and sequence similarity.

Authors:  Jordi Espadaler; Ramón Aragüés; Narayanan Eswar; Marc A Marti-Renom; Enrique Querol; Francesc X Avilés; Andrej Sali; Baldomero Oliva
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-09       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.