Literature DB >> 12051862

Enzyme function less conserved than anticipated.

Burkhard Rost1.   

Abstract

The level of sequence similarity that implies similarity in protein structure is well established. Recently, many groups proposed thresholds for similarity in sequence implying similarity in enzymatic function. All previous results suggest the strong conservation of enzymatic function above levels of 50% pairwise sequence identity. Here, I argue that all groups substantially overestimated the conservation of enzyme function because their data sets were either too biased, or too small. An unbiased analysis suggested that less than 30% of the pair fragments above 50% sequence identity have entirely identical EC numbers. Another surprising finding was that even BLAST E-values below 10(-50) did not suffice to automatically transfer enzyme function without errors. As expected, most misclassifications originated from similarities in relatively short regions and/or from transferring annotations for different domains. Both problems cannot be corrected easily by adjusting the thresholds for automatic transfer of genome annotations. A score relating sequence identity to alignment length (distance from HSSP-threshold) outperformed statistical BLAST scores for high sequence similarity. In particular, the distance score allowed error-free transfer of enzyme function for the 10% most similar enzyme pairs. The results illustrated how difficult it is to assess the conservation of protein function and to guarantee error-free genome annotations, in general: sets with millions of pair comparisons might not suffice to arrive at statistically significant conclusions. In practice, the revised detailed estimates for the sequence conservation of enzyme function may provide important benchmarks for everyday sequence analysis and for more cautious automatic genome annotations. (c) 2002 Elsevier Science Ltd.

Mesh:

Substances:

Year:  2002        PMID: 12051862     DOI: 10.1016/S0022-2836(02)00016-5

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  142 in total

1.  Transmembrane helix predictions revisited.

Authors:  Chien Peter Chen; Andrew Kernytsky; Burkhard Rost
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

2.  Sequence conserved for subcellular localization.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

3.  UniqueProt: Creating representative protein sequence sets.

Authors:  Sven Mika; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

4.  The PredictProtein server.

Authors:  Burkhard Rost; Jinfeng Liu
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 5.  Structural genomics: computational methods for structure analysis.

Authors:  Sharon Goldsmith-Fischman; Barry Honig
Journal:  Protein Sci       Date:  2003-09       Impact factor: 6.725

6.  The PredictProtein server.

Authors:  Burkhard Rost; Guy Yachdav; Jinfeng Liu
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

7.  LEON: multiple aLignment Evaluation Of Neighbours.

Authors:  Julie D Thompson; Véronique Prigent; Olivier Poch
Journal:  Nucleic Acids Res       Date:  2004-02-24       Impact factor: 16.971

8.  Automated prediction of protein function and detection of functional sites from structure.

Authors:  Florencio Pazos; Michael J E Sternberg
Journal:  Proc Natl Acad Sci U S A       Date:  2004-09-29       Impact factor: 11.205

Review 9.  The role of robustness in phenotypic adaptation and innovation.

Authors:  Andreas Wagner
Journal:  Proc Biol Sci       Date:  2012-01-04       Impact factor: 5.349

Review 10.  Proteins: form and function.

Authors:  Roy D Sleator
Journal:  Bioeng Bugs       Date:  2012-03-01
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.