Literature DB >> 16845032

SWAKK: a web server for detecting positive selection in proteins using a sliding window substitution rate analysis.

Han Liang¹, Weihua Zhou, Laura F Landweber.

Abstract

We present a bioinformatic web server (SWAKK) for detecting amino acid sites or regions of a protein under positive selection. It estimates the ratio of non-synonymous to synonymous substitution rates (K(A)/K(S)) between a pair of protein-coding DNA sequences, by sliding a 3D window, or sphere, across one reference structure. The program displays the results on the 3D protein structure. In addition, for comparison or when a reference structure is unavailable, the server can also perform a sliding window analysis on the primary sequence. The SWAKK web server is available at http://oxytricha.princeton.edu/SWAKK/.

Entities: Chemical Disease Gene

Mesh：

Substances：
Proteins

Year: 2006 PMID： 16845032 PMCID： PMC1538794 DOI： 10.1093/nar/gkl272

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Mutations and substitutions are fundamental changes in nucleotide sequence over evolutionary time (1). Among the well-established methods for studying the evolution of protein-coding genes, the ratio of non-synonymous substitution rate (KA, amino acid replacing) to synonymous substitution rate (KS, silent) is the most powerful measure of selective pressure on a protein (2–8). Since non-synonymous and synonymous substitution sites are interspersed within a gene segment, this approach literally compares the amino acid replacement rate against the silent substitution rate. Traditionally, if KA/KS < 1, the gene is inferred to be under negative (purifying) selection; if KA/KS = 1, the gene is probably neutrally evolving; if KA/KS > 1, the gene is probably under positive (adaptive) selection, since mutations in the gene have higher probabilities of being fixed in the population than expected from the predictions of neutrality. However, this approach, in effect, averages substitution rates over all amino acid sites in the sequence. Because most amino acids are expected to be under purifying selection, with positive selection most likely affecting only a few sites, this approach often loses the power to detect positive selection. To increase its sensitivity, a sliding window analysis along the primary sequence was introduced (9,10). Recent studies further indicate that when a three-dimensional (3D) protein structure is available, one can detect positive selection much more sensitively by using windows in 3D space instead (11–13). For example, Hughes and Nei (14) detected positive selection at the antigen recognition sites (ARS) in major histocompatibility complex (MHC) alleles but not the whole gene. These sites are close in tertiary space but discontinuous in the primary sequence. We developed a bioinformatic web server (SWAKK) whose primary purpose is to detect regions under positive selection using a sliding window KA/KS analysis (Figure 1). With the input of two protein-coding DNA sequences, one reference protein 3D structure and other user-defined parameters, the web server will automatically align the sequences, calculate KA/KS in each 3D window, and display the results on the 3D structure. The server also can perform the analysis on the primary sequence, either for comparison or when a structure is unavailable. In addition, if two inferred ancestral gene sequences are used as an input, the server can examine natural selection in an ancestral branch of a phylogenetic tree (15). We note that two important features distinguish our SWAKK server from other available web servers (16–18) that can identify functionally important sites in proteins. The first difference is that these other web servers focus on each single amino acid site or codon in the multiple sequence alignment, which essentially averages the overall time interval. Instead, our server considers a group of codons within a small window for each pairwise comparison. Second, unlike other web servers where protein 3D structures are only used to display the results, our SWAKK server takes full advantage of the information intrinsically stored in a 3D structure to define neighboring codon groups. Without requiring an explicit evolutionary model or expensive computation, SWAKK thus provides a useful tool to complement the existing arsenal of methods for detecting positive selection.

Figure 1

A snapshot of the SWAKK web server and sample output files. The upper part is a snapshot of the 3D analyzer web page. On the bottom are sample output files: Left, 3D provided by the 3D analyzer (when the structure is available), with amino acids colored based on inferred level of selection. Right, 2D graph ([KA − KS] versus window location) provided by the 1D analyzer. The example shown is the MHC glycoprotein gene (14,21) and, consistent with the previous studies, most of the sites identified under positive selection are clustered in the ARS domain.

METHODS

SWAKK accepts input as a pair of coding DNA sequences and a reference protein structure (PDB file). The DNA sequences are translated into amino acids and aligned with the amino acid sequence parsed from the PDB file using ClustalW (19). The alignment is then reverse translated to obtain a codon-based sequence alignment. Different translation tables are available to account for variation in genetic codes. Each amino acid in the reference structure is represented as a Cα atom. SWAKK constructs 3D windows by placing each amino acid at the center and including all amino acids within a pre-specified distance (in Ångströms) from the center. All the corresponding codons within a window are extracted to form a sub-alignment, and the KA/KS score (also the standard error) is calculated using the PAML package (20). Finally, according to the KA/KS scores and a user-defined cut-off, the sites (regions) can be classified as positive, negative or neutral, and these are displayed in different colors on the 3D structure using the Chime plug-in component. If a reference structure is not available, the server can also perform the analysis on the primary sequence. In this situation, the window size is defined as the distance in 1D sequence rather than in 3D space, and the results are displayed in the graph drawn by the GNUPLOT software. More detailed information is provided under the links ‘Overview’, ‘Help’ and ‘FAQ’ on the website.

SUMMARY

With more and more protein structures available, we expect this web server to become a valuable bioinformatic tool for detecting functionally important sites. The server facilitates the identification of regions of a protein sequence or structure that may be under positive selection and is easily accessible to the broad biological community.

20 in total

1. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Authors: Z Yang; R Nielsen
Journal: Mol Biol Evol Date: 2000-01 Impact factor: 16.240

2. The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

Authors: Anton Nekrutenko; Kateryna D Makova; Wen-Hsiung Li
Journal: Genome Res Date: 2002-01 Impact factor: 9.043

3. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes.

Authors: Ziheng Yang; Willie J Swanson
Journal: Mol Biol Evol Date: 2002-01 Impact factor: 16.240

4. Tertiary windowing to detect positive diversifying selection.

Authors: Ann-Charlotte Berglund; Björn Wallner; Arne Elofsson; David A Liberles
Journal: J Mol Evol Date: 2005-04 Impact factor: 2.395

5. Large-scale search for genes on which positive selection may operate.

Authors: T Endo; K Ikeo; T Gojobori
Journal: Mol Biol Evol Date: 1996-05 Impact factor: 16.240

6. Unbiased estimation of the rates of synonymous and nonsynonymous substitution.

Authors: W H Li
Journal: J Mol Evol Date: 1993-01 Impact factor: 2.395

7. An evolutionary trace method defines binding surfaces common to protein families.

Authors: O Lichtarge; H R Bourne; F E Cohen
Journal: J Mol Biol Date: 1996-03-29 Impact factor: 5.469

8. A method for estimating the numbers of synonymous and nonsynonymous substitutions per site.

Authors: J M Comeron
Journal: J Mol Evol Date: 1995-12 Impact factor: 2.395

9. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors: J D Thompson; D G Higgins; T J Gibson
Journal: Nucleic Acids Res Date: 1994-11-11 Impact factor: 16.971

10. Computational and biochemical analysis of the Xanthomonas effector AvrBs2 and its role in the modulation of Xanthomonas type three effector delivery.

Authors: Bingyu Zhao; Douglas Dahlbeck; Ksenia V Krasileva; Richard W Fong; Brian J Staskawicz
Journal: PLoS Pathog Date: 2011-12-01 Impact factor: 6.823

SWAKK: a web server for detecting positive selection in proteins using a sliding window substitution rate analysis.

INTRODUCTION

METHODS

SUMMARY

1. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

2. The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.

3. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes.

4. Tertiary windowing to detect positive diversifying selection.

5. Large-scale search for genes on which positive selection may operate.

6. Unbiased estimation of the rates of synonymous and nonsynonymous substitution.

7. An evolutionary trace method defines binding surfaces common to protein families.

8. A method for estimating the numbers of synonymous and nonsynonymous substitutions per site.

9. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

10. PAML: a program package for phylogenetic analysis by maximum likelihood.

1. Comparative evolution of photosynthetic genes in response to polyploid and nonpolyploid duplication.

2. Haitian variant tcpA in Vibrio cholerae O1 El Tor strains in Kolkata, India.

3. Conserved C-terminal nascent peptide binding domain of HYPK facilitates its chaperone-like activity.

4. JCoDA: a tool for detecting evolutionary selection.

5. The Maternal Effect Gene Wds Controls Wolbachia Titer in Nasonia.

6. How common are intragene windows with KA > KS owing to purifying selection on synonymous mutations?

7. Evidence for conserved function of γ-glutamyltranspeptidase in Helicobacter genus.

8. The non-random clustering of non-synonymous substitutions and its relationship to evolutionary rate.

9. Molecular evolution of HR, a gene that regulates the postnatal cycle of the hair follicle.

10. Computational and biochemical analysis of the Xanthomonas effector AvrBs2 and its role in the modulation of Xanthomonas type three effector delivery.