Literature DB >> 17584793

Sequence harmony: detecting functional specificity from alignments.

K Anton Feenstra¹, Walter Pirovano, Klaas Krab, Jaap Heringa.

Abstract

Multiple sequence alignments are often used for the identification of key specificity-determining residues within protein families. We present a web server implementation of the Sequence Harmony (SH) method previously introduced. SH accurately detects subfamily specific positions from a multiple alignment by scoring compositional differences between subfamilies, without imposing conservation. The SH web server allows a quick selection of subtype specific sites from a multiple alignment given a subfamily grouping. In addition, it allows the predicted sites to be directly mapped onto a protein structure and displayed. We demonstrate the use of the SH server using the family of plant mitochondrial alternative oxidases (AOX). In addition, we illustrate the usefulness of combining sequence and structural information by showing that the predicted sites are clustered into a few distinct regions in an AOX homology model. The SH web server can be accessed at www.ibi.vu.nl/programs/seqharmwww.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2007 PMID： 17584793 PMCID： PMC1933219 DOI： 10.1093/nar/gkm406

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

During the past years there has been a wide interest in studies of specificity-determining residues within protein subfamilies (1). Consequently, an increasing number of methods and web applications has become available that offer functional analyses of subtype specificity from multiple alignments (2–5). Previously we evaluated advantages and limitations of several of the state-of-the-art methods and introduced a new method called Sequence Harmony (SH) (6). SH accurately detects positions within an alignment that are responsible for functional differences between two protein subfamilies. To further facilitate the use of SH for a broad audience, we have implemented a comprehensive web server. The SH web server offers a fast, one-step analysis with a number of options, yielding results that can be interpreted easily. In this article we will guide the user through all the steps of the SH web-application by means of a biologically relevant example. We will look for subtype specific sites for the two subfamilies of the alternative oxidases (AOX) protein family of plant alternative oxidases. The subtype specific sites found are the best candidates to explain the functional differences. Other relevant applications of this method include pathway specificity, ligand specificity, host-specific viral infection, viral disease progression differences and viral drug resistance.

METHODS

Implementation

The SH method, as described previously (6), is currently implemented as an awk program. The main steps are as follows. Sequences are read from the alignment and separated into two user-specified groups. For each group separately and combined entropies (E) are calculated. SH values are calculated as , with E = − ∑ (p + p) log (p + p), i.e. using the sum of the normalized frequencies of groups A and B. SH values range from zero for completely non-overlapping residue compositions, to one for identical compositions. Next, sites are selected that have a SH score below a cutoff. Stretches of neighbouring selected sites are identified and the size of each of these stretches is assigned to the sites as the `Rank'. Finally, selected sites are sorted on i) increasing SH, ii) decreasing Rank and iii) increasing entropy. This sorted list of selected sites is the primary result of the SH algorithm. Currently alignments of up to about 5 million residues (including gaps) can be processed with runtimes on the order of tens of seconds, excluding the generation of the high-quality output PyMol image. The separate chains of the optional protein structure (PDB file) are aligned with the input alignment using the `profile' option of Muscle (7). Subsequently, the chains with the highest average SH score with respect to the input alignment are selected for graphical display as an image in an interactive Jmol applet.

Use of the Server

The SH web server contains basic and advanced options. Sanity of the input options is checked, e.g. whether an input alignment is uploaded or pasted, and not both, whether it is proper FASTA format. Problems are reported with an error message and offending input fields are highlighted. An example screenshot of the main page and input form is shown in Figure 1. The main input is a multiple sequence alignment of the protein family and a subdivision into two groups. Sequence data in FASTA format is case-insensitive and may be split over multiple lines; gaps should be represented with a dash (`–'); and lines beginning with a semi-colon (`;') are ignored as comment. From the alignment the program automatically generates a list of sequence IDs. The user defines the two input groups by selecting the identifier of the first sequence of the second group. By default, a cutoff of 0.2 for the SH score is used.

Figure 1.

Input screen of the Sequence Harmony server, showing part of the AOX alignment and model input. (A) Overview of the Sequence Harmony web page; and (B) the Sequence Harmony input options and advanced features. Advanced features allow more control over the analysis and output, but the default settings usually suffice for a basic analysis. The SH cutoff can be adjusted between zero (allowing no compositional overlap) and one (allowing full overlap). A higher cutoff will lead to the selection of a larger number of sites. A reference sequence can be selected from the alignment and a starting position can be set to provide a reference numbering in the output tables. Additionally, a PDB identifier can be specified by its four-letter code, to visualize the selected sites in the protein structure. Alternatively, a PDB file can be uploaded. The PDB file is automatically aligned with the multiple alignment, or manually when the reference position is set. The output is a sorted table and optional graphical representations of the selected sites as an image (generated by PyMol, www.pymol.org) and as an interactive Jmol applet (8). The Jmol applet includes buttons to set the SH cutoff for displaying selected sites, and to show or hide non-aligned chains in the PDB file. An example screenshot of the main output page is shown in Figure 2. An additional, non-sorted, table is available that lists all sites in the alignment. The tables are coloured from red (predicted subtype specific sites) to blue (sites not predicted).

Figure 2.

Output screen of the Sequence Harmony server with results of the AOX analysis. Included are a summary of input parameters, a PyMol rendered image, the Jmol applet and controls, and the output table. Links to the raw table with all alignment positions and to additional output files are also provided.

RESULTS

The AOX family of alternative plant oxidases

The cyanide-insensitive plant alternative oxidase (AOX) is a mitochondrial non-haeme quinol oxidase (9). An established function of AOX is thermogenesis in the spadices of Aroid lilies, but little is known about AOX function in other tissues (10). AOX is encoded in two discrete gene subfamilies (11). AOX1 is found in monocot and dicot plants and is induced by stress stimuli (11,12). AOX2 is usually constitutive and has at present not been found in monocot plants (11). We retrieved 31 AOX sequences from the NCBI non-redundant database (www.ncbi.nih.gov); 24 of the AOX1 and 7 of the AOX2 subfamily. Sequences were aligned using PSI-Praline (13,14) using secondary structure prediction information from PSIPRED (15). The alignment is available as Supplementary Data. Lacking an experimental AOX structure, we constructed a homology model using of the catalytic iron-binding N-terminal domain. Swiss-Model (16) was used to align the Sauromatum guttatum AOX1 sequence with the template structure 1AFR (17), following the description of Andersson and Nordlund (18). The PDB file is provided as Supplementary Data.

Sample input

The multiple alignment obtained was uploaded in the appropriate box of the SH web server and the first AOX2 sequence (AOX2_Arab_th) was selected as the first sequence of the second group (Figure 1). The default SH cutoff of 0.2 was used, meaning that partial overlap in residue composition is allowed. We chose AOX1 from Sauromatum guttatum as reference sequence so that the subtype specific sites obtained will be numbered accordingly (18). The PDB file of the model structure was uploaded to the SH web server.

Sample output

The results page of the SH server (Figure 2) shows at the top a summary of the input parameters, and optionally the graphical representations of the protein structure. Below that is the table containing all sites with a SH score (SH) below the cutoff value (default 0.2). These are sorted first on increasing SH score, then on decreasing rank (Rnk, number of neighbouring sites below the cutoff) and finally on increasing total Entropy (AB) of the alignment position, as described previously (6). Position Ali and Ref show the sequence position in the alignment and reference sequence, respectively. The `Consensus' columns give all residue types present in groups A and B, respectively, in order of decreasing frequency and in lowercase when the frequency is less than half of the highest. In addition, a link is provided to the raw table, that holds all information of the complete alignment without selection or ranking. The raw table is available as Supplementary Data. For the AOX family we found nine sites with a SH value of zero (bright red in Figure 2). An additional eight sites have SH values between zero and 0.2 (from light red to white). In the `Rnk' column, a value of 2 signifies a stretch of two consecutive sites with SH values below the cutoff (alignment positions 244–245 and 121–122). Interestingly, out of nine sites with zero harmony, four have non-zero entropy in at least one group, i.e. they are not totally conserved within both groups (alignment positions 136, 245, 269, 354). This is also reflected in the `Consensus' columns, which show multiple residue types occurring. The graphical representations of the protein structure show the predicted sites in our homology model of the C-terminal domain (positions 182–352), see Figure 3 and also Figure 2. This domain contains two sites of zero harmony and six of low harmony. Four low-harmony sites on the upper-left (Phe224, Val227, Ala228 and Asn284) are close to the Glu and His iron binding residues (18) (Figure 3), and could conceivably play a role in ligand binding or activation mechanisms. Three other low-harmony sites, Arg199, Tyr249 and Gly252 are located around a putative membrane-binding region (18), and could conceivably play a role in modulating the protein-membrane interactions or substrate access. The functional prediction of SH from the multiple alignment seems to be consistent with the structural prediction using homology modelling, although a different structural model could lead to different conclusions.

Figure 3.

Model of the AOX iron-binding domain with low-harmony sites shown in spheres and the iron-binding Glu and His residues in sticks. Figure created using PyMol (www.pymol.org).

CONCLUSION

Our SH web server identifies putative subtype specific sites based on an alignment and selection of two groups as input only. In addition, sites can be easily linked to structural information using a structure from the PDB directly, or, as shown in our example, using a homology model of the protein. Using this structural information, selected subtype specific sites can be grouped into spatial clusters of sites that are likely to share functional relationships. The combination of the SH algorithm and protein structural information has yielded a useful tool to interpret multiple sequence alignments, and to guide subsequent selection of interesting sites for experimental investigation.

SUPPLEMENTARY DATA

The AOX input alignment, raw output table, `coloured' PDB and pymol script are available as Supplementary Data. In addition, the output is available at www.ibi.vu.nl/programs/seqharmwww/showcase_aox.

18 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. A revised model of the active site of alternative oxidase.

Authors: M E Andersson; P Nordlund
Journal: FEBS Lett Date: 1999-04-16 Impact factor: 4.124

3. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors.

Authors: Leonid A Mirny; Mikhail S Gelfand
Journal: J Mol Biol Date: 2002-08-02 Impact factor: 5.469

4. Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families.

Authors: Olga V Kalinina; Andrey A Mironov; Mikhail S Gelfand; Aleksandra B Rakhmaninova
Journal: Protein Sci Date: 2004-02 Impact factor: 6.725

Review 5. Prediction of protein function from protein sequence and structure.

Authors: James C Whisstock; Arthur M Lesk
Journal: Q Rev Biophys Date: 2003-08 Impact factor: 5.318

6. ALTERNATIVE OXIDASE: From Gene to Function.

Authors: Greg C. Vanlerberghe; Lee McIntosh
Journal: Annu Rev Plant Physiol Plant Mol Biol Date: 1997-06

7. A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-coupled receptors.

Authors: Kai Ye; Eric-Wubbo M Lameijer; Margot W Beukers; Adriaan P Ijzerman
Journal: Proteins Date: 2006-06-01

Review 8. Exploring the molecular nature of alternative oxidase regulation and catalysis.

Authors: Charles Affourtit; Mary S Albury; Paul G Crichton; Anthony L Moore
Journal: FEBS Lett Date: 2002-01-16 Impact factor: 4.124

9. Predicting specificity-determining residues in two large eukaryotic transcription factor families.

Authors: Jason E Donald; Eugene I Shakhnovich
Journal: Nucleic Acids Res Date: 2005-08-05 Impact factor: 16.971

10. Sequence comparison by sequence harmony identifies subtype-specific functional sites.

Authors: Walter Pirovano; K Anton Feenstra; Jaap Heringa
Journal: Nucleic Acids Res Date: 2006-11-27 Impact factor: 16.971

18 in total

1. Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

Authors: Andrew F Neuwald
Journal: Stat Appl Genet Mol Biol Date: 2011-08-04

2. webPRC: the Profile Comparer for alignment-based searching of public domain databases.

Authors: Bernd W Brandt; Jaap Heringa
Journal: Nucleic Acids Res Date: 2009-05-06 Impact factor: 16.971

3. INTREPID: a web server for prediction of functionally important residues by evolutionary analysis.

Authors: Sriram Sankararaman; Bryan Kolaczkowski; Kimmen Sjölander
Journal: Nucleic Acids Res Date: 2009-05-13 Impact factor: 16.971

4. Multi-Harmony: detecting functional specificity from sequence alignment.

Authors: Bernd W Brandt; K Anton Feenstra; Jaap Heringa
Journal: Nucleic Acids Res Date: 2010-06-04 Impact factor: 16.971

5. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies.

Authors: Pavel V Mazin; Mikhail S Gelfand; Andrey A Mironov; Aleksandra B Rakhmaninova; Anatoly R Rubinov; Robert B Russell; Olga V Kalinina
Journal: Algorithms Mol Biol Date: 2010-07-15 Impact factor: 1.405

6. Combining specificity determining and conserved residues improves functional site prediction.

Authors: Olga V Kalinina; Mikhail S Gelfand; Robert B Russell
Journal: BMC Bioinformatics Date: 2009-06-09 Impact factor: 3.169

7. SDR: a database of predicted specificity-determining residues in proteins.

Authors: Jason E Donald; Eugene I Shakhnovich
Journal: Nucleic Acids Res Date: 2008-10-16 Impact factor: 16.971

8. Expansion and diversification of the gibberellin receptor GIBBERELLIN INSENSITIVE DWARF1 (GID1) family in land plants.

Authors: Rajesh K Gazara; Kanhu C Moharana; Daniel Bellieny-Rabelo; Thiago M Venancio
Journal: Plant Mol Biol Date: 2018-06-28 Impact factor: 4.076

9. Both Intrinsic Substrate Preference and Network Context Contribute to Substrate Selection of Classical Tyrosine Phosphatases.

Authors: Anita Palma; Michele Tinti; Serena Paoluzi; Elena Santonico; Bernd Willem Brandt; Rob Hooft van Huijsduijnen; Antonia Masch; Jaap Heringa; Mike Schutkowski; Luisa Castagnoli; Gianni Cesareni
Journal: J Biol Chem Date: 2017-02-03 Impact factor: 5.157

10. Structure based hypothesis of a mitochondrial ribosome rescue mechanism.

Authors: Martijn A Huynen; Isabel Duarte; Zofia M A Chrzanowska-Lightowlers; Sander B Nabuurs
Journal: Biol Direct Date: 2012-05-08 Impact factor: 4.540