Literature DB >> 19307237

Evolutionary Trace Annotation Server: automated enzyme function prediction in protein structures using 3D templates.

R Matthew Ward¹, Eric Venner, Bryce Daines, Stephen Murray, Serkan Erdin, David M Kristensen, Olivier Lichtarge.

Abstract

SUMMARY: The Evolutionary Trace Annotation (ETA) Server predicts enzymatic activity. ETA starts with a structure of unknown function, such as those from structural genomics, and with no prior knowledge of its mechanism uses the phylogenetic Evolutionary Trace (ET) method to extract key functional residues and propose a function-associated 3D motif, called a 3D template. ETA then searches previously annotated structures for geometric template matches that suggest molecular and thus functional mimicry. In order to maximize the predictive value of these matches, ETA next applies distinctive specificity filters -- evolutionary similarity, function plurality and match reciprocity. In large scale controls on enzymes, prediction coverage is 43% but the positive predictive value rises to 92%, thus minimizing false annotations. Users may modify any search parameter, including the template. ETA thus expands the ET suite for protein structure annotation, and can contribute to the annotation efforts of metaservers. AVAILABILITY: The ETA Server is a web application available at (http://mammoth.bcm.tmc.edu/eta/).

Entities: Chemical Species

Mesh：

Substances：
Enzymes
Proteins

Year: 2009 PMID： 19307237 PMCID： PMC2682511 DOI： 10.1093/bioinformatics/btp160

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

As the number of protein structures mushrooms, in large part due to structural genomics (SG) efforts, a detailed knowledge of their biological roles remains elusive (Redfern et al., 2008). Thus most Protein Data Bank (PDB) (Berman et al., 2000) annotations are computationally rather than experimentally derived, and still 28% of the 2191 SG proteins solved last year were labeled ‘unknown’ or ‘hypothetical’ as of September 2008. Annotation transfer among homologs identified by PSI-BLAST (Altschul et al., 1997) or similar tools remains the most popular and useful method. The problem is that homology does not guarantee functional equivalence, as often divergence yields proteins of different functions (Gerlt and Babbitt, 2000). Even at 65% sequence identity, 10% of protein pairs already have different 4-digit Enzyme Commission (EC) functions, and at 45% identity 10% differ in the less specific 3-digit functions (Tian and Skolnick, 2003). This leads to errors that propagate, dramatically decreasing the effectiveness of future predictions (Brenner, 1999). Increasing annotation specificity is therefore paramount. To this end, an orthogonal approach relies on 3D templates: small structural motifs built from key amino acid functional determinants that suggest functional similarity when matched geometrically in unannotated proteins (Wallace et al., 1997). Two such methods are in the popular ProFunc metaserver: Enzyme Active Sites and Reverse Templates (Laskowski et al., 2005). Because 3D templates are local and narrowly focus on the molecular basis of function, they can remain accurate even when overall similarity becomes unreliably low, or when it remains so high as to obscure a key functional site variation. However, 3D template annotations also have weaknesses: a lack of known functional determinants from which to build them on a large scale, and low specificity when derived heuristically (Kristensen et al., 2006). To build templates without any prior knowledge of the catalytic mechanism, the Evolutionary Trace Annotation (ETA) (Kristensen et al., 2006; 2008) server heuristically selects residues based on Evolutionary Trace (ET) predictions of functional sites in protein structures (Lichtarge et al., 1996). These predictions were extensively validated experimentally (Onrust et al., 1997; Ribes-Zamora et al., 2007; Sowa et al., 2001) and computationally (Mihalek et al., 2004; Res et al., 2005; Yao et al., 2003). Moreover, ETA templates either overlap catalytic residues (78%), or lie in their immediate vicinity (22%) (Ward et al., 2008). To raise specificity, ETA filters geometric template matches (i) by ET rank similarity (Kristensen, et al., 2006); (ii) by match reciprocity back to the original protein (Ward et al., 2008); and (iii) by the extent that a plurality of matches point to the same function (Kristensen et al., 2008). In 1218 SG control enzymes, ETA made 527 predictions, i.e. 43% prediction coverage, of which 478 were true, for 92% positive predictive value (PPV). ETA's performance improves on the Enzyme Active Site and Reverse Template methods from ProFunc (Ward et al., 2008). ETA also proved complementary to sequence-based methods (Kristensen et al., 2008). If needed, prediction coverage can be raised to 77% (934/1218) by including non-reciprocal matches, but PPV then decreases to 82% (769/934) PPV.

2 ETA SERVER OVERVIEW

The ETA Server provides functional annotations of enzyme activity. A web interface lets users pick a protein. The server then automatically creates a template, identifies matches to annotated structures, applies specificity filters, and reports likely functions. Backtracking is possible, and users can alter the template. In-line help is available, as well as a manual with a complete walkthrough example.

2.1 Template creation

Users select a protein by PDB code and chain (e.g. 1yvwA). The server then either retrieves a cached ET analysis or runs one anew for this protein (∼5 min). The user may also submit custom ET data as a zip file from the ET Wizard (Morgan et al., 2006), allowing full control of the ET analysis, or use of novel structures. Next, ETA builds a template of Cα atoms from the six best-ranked residues in a cluster of 10 surface ET residues (Kristensen et al., 2008). A PyMOL (DeLano, 2002) image of the protein structure displays the template so the user may see and revise the residue choices, triggering image updates. A PyMOL session can also be downloaded to study the template interactively. For a given template, the server displays the amino acid types that it can match in another protein, chosen from cognate residues in homologs. All choices are customizable.

2.2 Geometric search and annotation

The residue numbers and types form a complete template that is searched against proteins in the 2006 PDB_SELECT_90 (Hobohm et al., 1992). A support vector machine (Ward et al., 2008) classifies the most relevant matches, using geometric (least root mean squared deviation, RMSD) and evolutionary similarity features (difference in ET score) (Kristensen et al., 2006). Reciprocally, templates from each protein in the PDB_SELECT_90 are also searched back against the query structure. Matches are grouped by function and whether they are reciprocal. Annotations fall in two classes: those exclusively from reciprocal matches, which are the most reliable; and those that also rely on one-way matches, which are more sensitive but less specific. In both cases, the enzymatic function with a plurality of matches is listed first, followed by possible alternatives. These functions—three-digit EC numbers—are linked to their definitions. Matches to non-enzymes and unannotated proteins are also displayed, as they may still provide useful information. Each match that supports a given prediction is listed, with a link to the relevant PDB structure, a list of matched residues, their RMSD, and their ET similarity. Images of the template and match can also be generated to review them visually. All the raw ET and ETA data can be downloaded.

3 CONCLUSIONS

The ETA server expands the ET suite for protein structure annotation (Mihalek et al., 2006; Morgan et al., 2006) by predicting enzymatic functions of protein structures without prior knowledge of functional sites or mechanisms. In reciprocal mode, it is biased to minimize misannotations by maximizing PPV (92%) at the expense of prediction coverage (43%). In all-match mode, prediction coverage is better (77%), but then PPV is lower (82%). The interface allows customized searches, displays predicted functions, and provides supporting evidence and raw data. Eventually, upgrades should add non-enzymatic function predictions as well. Feedback and suggestions are welcome at etaserver@bcm.edu.

21 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. An evolution based classifier for prediction of protein interfaces without using protein structures.

Authors: I Res; I Mihalek; O Lichtarge
Journal: Bioinformatics Date: 2005-02-22 Impact factor: 6.937

3. ET viewer: an application for predicting and visualizing functional sites in protein structures.

Authors: Daniel H Morgan; David M Kristensen; David Mittelman; Olivier Lichtarge
Journal: Bioinformatics Date: 2006-06-29 Impact factor: 6.937

4. Evolutionary trace report_maker: a new type of service for comparative analysis of proteins.

Authors: I Mihalek; I Res; O Lichtarge
Journal: Bioinformatics Date: 2006-04-27 Impact factor: 6.937

5. Distinct faces of the Ku heterodimer mediate DNA repair and telomeric functions.

Authors: Albert Ribes-Zamora; Ivana Mihalek; Olivier Lichtarge; Alison A Bertuch
Journal: Nat Struct Mol Biol Date: 2007-03-11 Impact factor: 15.369

Review 6. Exploring the structure and function paradigm.

Authors: Oliver C Redfern; Benoit Dessailly; Christine A Orengo
Journal: Curr Opin Struct Biol Date: 2008-06 Impact factor: 6.809

7. ProFunc: a server for predicting protein function from 3D structure.

Authors: Roman A Laskowski; James D Watson; Janet M Thornton
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

Review 8. Can sequence determine function?

Authors: J A Gerlt; P C Babbitt
Journal: Genome Biol Date: 2000-11-08 Impact factor: 13.583

9. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids.

Authors: David M Kristensen; R Matthew Ward; Andreas Martin Lisewski; Serkan Erdin; Brian Y Chen; Viacheslav Y Fofanov; Marek Kimmel; Lydia E Kavraki; Olivier Lichtarge
Journal: BMC Bioinformatics Date: 2008-01-11 Impact factor: 3.169

10. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features.

Authors: R Matthew Ward; Serkan Erdin; Tuan A Tran; David M Kristensen; Andreas Martin Lisewski; Olivier Lichtarge
Journal: PLoS One Date: 2008-05-07 Impact factor: 3.240

18 in total

1. ETAscape: analyzing protein networks to predict enzymatic function and substrates in Cytoscape.

Authors: Benjamin J Bachman; Eric Venner; Rhonald C Lua; Serkan Erdin; Olivier Lichtarge
Journal: Bioinformatics Date: 2012-06-11 Impact factor: 6.937

2. Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors.

Authors: Gustavo J Rodriguez; Rong Yao; Olivier Lichtarge; Theodore G Wensel
Journal: Proc Natl Acad Sci U S A Date: 2010-04-12 Impact factor: 11.205

3. Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation.

Authors: A D Wilkins; R Lua; S Erdin; R M Ward; O Lichtarge
Journal: Protein Sci Date: 2010-07 Impact factor: 6.725

Review 4. Evolution: a guide to perturb protein function and networks.

Authors: Olivier Lichtarge; Angela Wilkins
Journal: Curr Opin Struct Biol Date: 2010-05-03 Impact factor: 6.809

5. A new approach to assess and predict the functional roles of proteins across all known structures.

Authors: Elchin S Julfayev; Ryan J McLaughlin; Yi-Ping Tao; William A McLaughlin
Journal: J Struct Funct Genomics Date: 2011-03-29

Review 6. Protein function prediction: towards integration of similarity metrics.

Authors: Serkan Erdin; Andreas Martin Lisewski; Olivier Lichtarge
Journal: Curr Opin Struct Biol Date: 2011-02-24 Impact factor: 6.809

Review 7. Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests.

Authors: Panagiotis Katsonis; Olivier Lichtarge
Journal: Hum Mutat Date: 2017-06-21 Impact factor: 4.878

8. Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification.

Authors: Cristina Marino Buslje; Elin Teppa; Tomas Di Doménico; José María Delfino; Morten Nielsen
Journal: PLoS Comput Biol Date: 2010-11-04 Impact factor: 4.475

9. PANNZER-A practical tool for protein function prediction.

Authors: Petri Törönen; Liisa Holm
Journal: Protein Sci Date: 2021-10-14 Impact factor: 6.725

10. Ensemble approach to predict specificity determinants: benchmarking and validation.

Authors: Saikat Chakrabarti; Anna R Panchenko
Journal: BMC Bioinformatics Date: 2009-07-02 Impact factor: 3.169