Literature DB >> 20451164

KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.

Dapeng Wang¹, Yubin Zhang, Zhang Zhang, Jiang Zhu, Jun Yu.

Abstract

We present an integrated stand-alone software package named KaKs_Calculator 2.0 as an updated version. It incorporates 17 methods for the calculation of nonsynonymous and synonymous substitution rates; among them, we added our modified versions of several widely used methods as the gamma series including gamma-NG, gamma-LWL, gamma-MLWL, gamma-LPB, gamma-MLPB, gamma-YN and gamma-MYN, which have been demonstrated to perform better under certain conditions than their original forms and are not implemented in the previous version. The package is readily used for the identification of positively selected sites based on a sliding window across the sequences of interests in 5' to 3' direction of protein-coding sequences, and have improved the overall performance on sequence analysis for evolution studies. A toolbox, including C++ and Java source code and executable files on both Windows and Linux platforms together with a user instruction, is downloadable from the website for academic purpose at https://sourceforge.net/projects/kakscalculator2/. 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

Entities: Chemical Gene Species

Mesh：

Year: 2010 PMID： 20451164 PMCID： PMC5054116 DOI： 10.1016/S1672-0229(10)60008-3

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Calculating nonsynonymous (Ka) and synonymous (Ks) substitution rates is a useful way for evaluating sequence variations for protein orthologs across different species or taxonomical lineages with unknown evolutionary status. Furthermore, it is often important to recognize positively selected sites and to identify genes with selective hotspots. There have been numerous methods and software tools developed for such purposes in the public domain, including PAML (, MEGA (, DnaSP (, HyPhy ( and certain modules from Bioperl (. However, after careful simulations and real data analysis, we believe that a single method will not be readily identified to be used under all circumstances 6, 7, therefore we created the version of the KaKs_Calculator 1.0 (, which adopted model-selected and model-averaged techniques to compute Ka/Ks values by means of a group of existing nucleotide substitution models. Since the majority of DNA sequence sites are considered to be invariable due to functional restraints and evolutionary distances, the selective pressure varies among different sites in a sequence, thus Ka/Ks calculations only based on the entire gene are not enough to detect the individual sites subjected to adaptive selection. To conquer this problem, a “sliding window” strategy has been introduced to several web servers such as SWAPSC ( and WSPMaker (, while these tools adopted fewer (mostly one) models for Ka and Ks calculations. Here we provide an updated version of KaKs_Calculator, which solves these two questions in a simple way. In particular, we have embedded gamma-series methods into this new version.

New Features

We have brought up three novel features into KaKs_Calculator 2.0. First, unlike the existent Ka/Ks algorithms, the new software can take the variable mutation rates across sequence sites into account, which contain vital information for molecular evolutionary studies. We created seven related methods namely γ-NG, γ-LWL, γ-MLWL, γ-LPB, γ-MLPB, γ-YN and γ-MYN by introducing gamma distribution to model the mutation rates; the importance of the new methods has been demonstrated as the ignorance gives rise to biased computational results 11, 12. We therefore implemented these new methods into the updated core tool of version 2.0, whose core toolset has seventeen algorithms including seven original approximate methods, seven gamma-series methods, one maximum likelihood method (GY), and two expanding methods (model selected and model averaged). The methods provide not only the values of Ka, Ks and Ka/Ks, but also other key information from paired orthologous sequences, including the number of synonymous/nonsynonymous sites, substitutions, divergence time, substitution-rate-ratio, GC content, and AICc. Second, we added three new modules—split, plot, dpss—to evaluate adaptive selection at the gene sequence level. As an expanding toolset, they adopt a sliding window with user’s definition on window length and step length. Split is responsible for the division of the raw paired orthologous sequences into portions on the basis of dynamic windows in the positive direction. Plot deals with the outcome of the core toolset after the nucleotide sequences from Split have been computed, resulting in a massive collection of figures illustrating Ka, Ks and Ka/Ks (omega) in intervals. Dpss identifies the positions of positively selected sites based on the initial analyses. Third, it should be emphasized that all above-mentioned processes are capable of handling massive data in a timely fashion. In particular, all transferrable data including sequences and resulting information are contained in a single file. We provide executable files as well as source codes for the package and tested all programs on both Windows 2000/XP/Vista and Linux (Red Hat 3.4.6-8) platforms. The toolkit is freely available (licensed under GPLv3) online at https://sourceforge.net/projects/kakscalculator2/.

Implementation

In order to conveniently update the algorithm and to friendly communicate with users, we implement the new version with a “toolkit” idea in mind. Therefore, the integrated software is divided into two essential parts to better serve for different functionality: the core toolset that calculates Ka and Ks, and the expanding toolset that is responsible for additional computation activities based on the Ka and Ks calculation (e.g., with a sliding window strategy) (Figure 1). In the core toolset, we design the GUI with visual C++’s MFC (Microsoft Foundation Classes) that manages documents and allows users to view the objects, and the entire program is object-oriented. Each main method has its own class in the code and the multi-thread operations among them use the CPU time allocations very efficiently. We adopt Java-6 to program the expanding toolset because of its advantages across different platforms. We choose R language (http://www.r-project.org/) to draw high-level graphics from inputting data. To call for the R function from Java, we employ a package named “Rserve” (http://www.rforge.net/Rserve/index.html), which is a program responding to requests from clients based on the TCP/IP protocol. In details, we use java to invoke the JRclient suite and connect it after Rserve starts on R environment; under this circumstance each connection has its workspace and directory. Moreover, the server allows many clients to plot their data simultaneously. In consideration of the running speed, it is so fast that a graph covering thousands of data points can be plotted in a few seconds.

Figure 1

A flowchart of software design on KaKs_Calculator 2.0.

Evaluation

We have evaluated the performance of the gamma-series methods in Ka/Ks calculations in previous studies 11, 12. In the process of identifying positively selected sites, we have also successfully applied the toolbox to two real cases, including the animal alpha-defensin genes investigated in Lynn et al. ( and the TAS1R3 (taste receptor type 1 member 3) genes reported to be responsible for the ability to recognize the sweetness ( (Figure 2). It is important to combine the gamma-series methods with a sliding window strategy; the former represents the variation of raw mutation across sites and the latter reveals if each site is driven by different selective pressure based on the assumption that the omega (Ka/Ks) values are not equal across orthologous gene sequences. In particular, when window slices become dense enough, it approaches the “site models” (, similar to the thought of “integral” definition in mathematics. We believe that the software provides an excellent choice when one calculates for positively selected sites. A final note is that we will construct ancestral sequences for the measurement of lineage-specific selective strength in our next update.

Figure 2

An example for displaying Ka, Ks and Ka/Ks to identify positively selected sites. This analysis was performed based on the TAS1R3 gene pairs from Homo sapiens (NM_152228) and Canis familiaris (XM_843615).

Authors’ contributions

DW and JY conceived and designed this study. DW and YZ programmed the software and drafted the manuscript. ZZ supplied several bug reports and modified schemes in the previous version of the software. DW and JZ contributed to data analyses and software testing. JY managed this project and revised the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors have declared that no competing interests exist.

14 in total

1. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

2. SWAPSC: sliding window analysis procedure to detect selective constraints.

Authors: Mario A Fares
Journal: Bioinformatics Date: 2004-05-06 Impact factor: 6.937

3. HyPhy: hypothesis testing using phylogenies.

Authors: Sergei L Kosakovsky Pond; Simon D W Frost; Spencer V Muse
Journal: Bioinformatics Date: 2004-10-27 Impact factor: 6.937

4. Tas1r3, encoding a new candidate taste receptor, is allelic to the sweet responsiveness locus Sac.

Authors: M Max; Y G Shanker; L Huang; M Rong; Z Liu; F Campagne; H Weinstein; S Damak; R F Margolskee
Journal: Nat Genet Date: 2001-05 Impact factor: 38.330

5. PAML 4: phylogenetic analysis by maximum likelihood.

Authors: Ziheng Yang
Journal: Mol Biol Evol Date: 2007-05-04 Impact factor: 16.240

6. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data.

Authors: P Librado; J Rozas
Journal: Bioinformatics Date: 2009-04-03 Impact factor: 6.937

7. Evaluation of six methods for estimating synonymous and nonsynonymous substitution rates.

Authors: Zhang Zhang; Jun Yu
Journal: Genomics Proteomics Bioinformatics Date: 2006-08 Impact factor: 7.691

8. Gamma-MYN: a new algorithm for estimating Ka and Ks with consideration of variable substitution rates.

Authors: Da-Peng Wang; Hao-Lei Wan; Song Zhang; Jun Yu
Journal: Biol Direct Date: 2009-06-16 Impact factor: 4.540

9. WSPMaker: a web tool for calculating selection pressure in proteins and domains using window-sliding.

Authors: Yong Seok Lee; Tae-Hyung Kim; Tae-Wook Kang; Won-Hyong Chung; Gwang-Sik Shin
Journal: BMC Bioinformatics Date: 2008-12-12 Impact factor: 3.169

10. How do variable substitution rates influence Ka and Ks calculations?

Authors: Dapeng Wang; Song Zhang; Fuhong He; Jiang Zhu; Songnian Hu; Jun Yu
Journal: Genomics Proteomics Bioinformatics Date: 2009-09 Impact factor: 7.691

420 in total

1. Origin and Functional Prediction of Pollen Allergens in Plants.

Authors: Miaolin Chen; Jie Xu; Deborah Devis; Jianxin Shi; Kang Ren; Iain Searle; Dabing Zhang
Journal: Plant Physiol Date: 2016-07-19 Impact factor: 8.340

2. Comparative Analyses of Mitochondrial Genomes Provide Evolutionary Insights Into Nematode-Trapping Fungi.

Authors: Ying Zhang; Guangzhu Yang; Meiling Fang; Chu Deng; Ke-Qin Zhang; Zefen Yu; Jianping Xu
Journal: Front Microbiol Date: 2020-04-15 Impact factor: 5.640

3. Characterization and Functional Analysis of the Poplar Pectate Lyase-Like Gene PtPL1-18 Reveal Its Role in the Development of Vascular Tissues.

Authors: Yun Bai; Dan Wu; Fei Liu; Yuyang Li; Peng Chen; Mengzhu Lu; Bo Zheng
Journal: Front Plant Sci Date: 2017-06-28 Impact factor: 5.753

4. Ecological dynamics and co-occurrence among marine phytoplankton, bacteria and myoviruses shows microdiversity matters.

Authors: David M Needham; Rohan Sachdeva; Jed A Fuhrman
Journal: ISME J Date: 2017-04-11 Impact factor: 10.302

5. Different divergence events for three pairs of PEBPs in Gossypium as implied by evolutionary analysis.

Authors: Youjun Lu; Wei Chen; Lanjie Zhao; Jinbo Yao; Yan Li; Weijun Yang; Ziyang Liu; Yongshan Zhang; Jie Sun
Journal: Genes Genomics Date: 2019-01-04 Impact factor: 1.839

6. Horizontal gene transfer allowed the emergence of broad host range entomopathogens.

Authors: Qiangqiang Zhang; Xiaoxuan Chen; Chuan Xu; Hong Zhao; Xing Zhang; Guohong Zeng; Ying Qian; Ran Liu; Na Guo; Wubin Mi; Yamin Meng; Raymond J St Leger; Weiguo Fang
Journal: Proc Natl Acad Sci U S A Date: 2019-04-04 Impact factor: 11.205

7. The Draft Genome of a Flat Peach (Prunus persica L. cv. '124 Pan') Provides Insights into Its Good Fruit Flavor Traits.

Authors: Aidi Zhang; Hui Zhou; Xiaohan Jiang; Yuepeng Han; Xiujun Zhang
Journal: Plants (Basel) Date: 2021-03-12

8. Supergene evolution via stepwise duplications and neofunctionalization of a floral-organ identity gene.

Authors: Cuong Nguyen Huu; Barbara Keller; Elena Conti; Christian Kappel; Michael Lenhard
Journal: Proc Natl Acad Sci U S A Date: 2020-08-31 Impact factor: 11.205

9. Genome-wide identification and comparative analysis of the cation proton antiporters family in pear and four other Rosaceae species.

Authors: Hongsheng Zhou; Kaijie Qi; Xing Liu; Hao Yin; Peng Wang; Jianqing Chen; Juyou Wu; Shaoling Zhang
Journal: Mol Genet Genomics Date: 2016-05-19 Impact factor: 3.291

10. Both size and GC-content of minimal introns are selected in human populations.

Authors: Dapeng Wang; Jun Yu
Journal: PLoS One Date: 2011-03-17 Impact factor: 3.240