Literature DB >> 10704319

Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

C A Wilson1, J Kreychman, M Gerstein.   

Abstract

Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with known structure and function. Our domain pairs, which are constructed according to the SCOP fold classification, range in similarity from just sharing a fold, to being nearly identical. Our results show that traditional scores for sequence and structure similarity have the same basic exponential relationship as observed previously, with structural divergence, measured in RMS, being exponentially related to sequence divergence, measured in percent identity. However, as the scale of our survey is much larger than any previous investigations, our results have greater statistical weight and precision. We have been able to express the relationship of sequence and structure similarity using more "modern scores," such as Smith-Waterman alignment scores and probabilistic P-values for both sequence and structure comparison. These modern scores address some of the problems with traditional scores, such as determining a conserved core and correcting for length dependency; they enable us to phrase the sequence-structure relationship in more precise and accurate terms. We found that the basic exponential sequence-structure relationship is very general: the same essential relationship is found in the different secondary-structure classes and is evident in all the scoring schemes. To relate function to sequence and structure we assigned various levels of functional similarity to the domain pairs, based on a simple functional classification scheme. This scheme was constructed by combining and augmenting annotations in the enzyme and fly functional classifications and comparing subsets of these to the Escherichia coli and yeast classifications. We found sigmoidal relationships between similarity in function and sequence, with clear thresholds for different levels of functional conservation. For pairs of domains that share the same fold, precise function appears to be conserved down to approximately 40 % sequence identity, whereas broad functional class is conserved to approximately 25 %. Interestingly, percent identity is more effective at quantifying functional conservation than the more modern scores (e.g. P-values). Results of all the pairwise comparisons and our combined functional classification scheme for protein structures can be accessed from a web database at http://bioinfo.mbb.yale.edu/alignCopyright 2000 Academic Press.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10704319     DOI: 10.1006/jmbi.2000.3550

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  126 in total

1.  A rapid classification protocol for the CATH Domain Database to support structural genomics.

Authors:  F M Pearl; N Martin; J E Bray; D W Buchan; A P Harrison; D Lee; G A Reeves; A J Shepherd; I Sillitoe; A E Todd; J M Thornton; C A Orengo
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.

Authors:  J Qian; B Stenger; C A Wilson; J Lin; R Jansen; S A Teichmann; J Park; W G Krebs; H Yu; V Alexandrov; N Echols; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-04-15       Impact factor: 16.971

3.  Including biological literature improves homology search.

Authors:  J T Chang; S Raychaudhuri; R B Altman
Journal:  Pac Symp Biocomput       Date:  2001

4.  The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework.

Authors:  W G Krebs; M Gerstein
Journal:  Nucleic Acids Res       Date:  2000-04-15       Impact factor: 16.971

5.  Motif-based fold assignment.

Authors:  L Salwinski; D Eisenberg
Journal:  Protein Sci       Date:  2001-12       Impact factor: 6.725

6.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

Authors:  H Hegyi; M Gerstein
Journal:  Genome Res       Date:  2001-10       Impact factor: 9.043

7.  Functional versatility and molecular diversity of the metabolic map of Escherichia coli.

Authors:  S Tsoka; C A Ouzounis
Journal:  Genome Res       Date:  2001-09       Impact factor: 9.043

8.  Structural characterization of the human proteome.

Authors:  Arne Müller; Robert M MacCallum; Michael J E Sternberg
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

9.  Analysis of protein sequence/structure similarity relationships.

Authors:  Hin Hark Gan; Rebecca A Perlow; Sharmili Roy; Joy Ko; Min Wu; Jing Huang; Shixiang Yan; Angelo Nicoletta; Jonathan Vafai; Ding Sun; Lihua Wang; Joyce E Noah; Samuela Pasquali; Tamar Schlick
Journal:  Biophys J       Date:  2002-11       Impact factor: 4.033

10.  Myosin B of Plasmodium falciparum (PfMyoB): in silico prediction of its three-dimensional structure and its possible interaction with MTIP.

Authors:  Paula C Hernández; Liliana Morales; Isabel C Castellanos; Moisés Wasserman; Jacqueline Chaparro-Olaya
Journal:  Parasitol Res       Date:  2017-03-07       Impact factor: 2.289

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.