Literature DB >> 19325855

MySSP: non-stationary evolutionary sequence simulation, including indels.

Michael S Rosenberg1.   

Abstract

MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package.

Entities:  

Keywords:  DNA; Indels; Non-stationarity; Sequence Simulation

Year:  2007        PMID: 19325855      PMCID: PMC2658873     

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

Simulation of molecular sequence evolution has become a fundamental part of comparative genomic and bioinformatics analysis. Simulation has proven particularly useful for testing the efficacy of bioinformatics methods and techniques under a variety of conditions and assumptions (or violations thereof), including, for example, phylogenetic analysis (Hillis 1995; Nei 1996; Takahashi and Nei 2000; Rosenberg and Kumar 2003; Huelsenbeck and Rannala 2004, just to name a few) and sequence alignment (Keightley and Johnson 2004; Pollard et al 2004; Rosenberg 2005). Many programs are available for simulating molecular sequence evolution, including Evolver (PAML) (Yang 1997), Seq-Gen (Rambaut and Grassly 1997), ROSE (Stoye et al 1998), and DAWG (Cartwright 2005), each with its own set of strengths and weaknesses. The program presented here, MySSP, has been gradually developed over a series of projects (including, eg, Rosenberg and Kumar 2001; Rosenberg and Kumar 2003; Gadagkar et al 2005; Rosenberg 2005) and is being made publicly available because of some unique features, individually and in combination, which are not found in other available packages. As with many similar programs, given a fixed tree (supplied by the user) MySSP constructs an initial DNA sequence at the root of the tree and simulates evolution across the tree using a variety of common models of DNA evolution, including Jukes-Cantor (Jukes and Cantor 1969), Kimura two-parameter (Kimura 1980), equal input, Hasegawa-Kishino-Yano (Hasegawa et al 1985), and the general time-reversible model. Rate variation among sites can optionally be modeled with the standard gamma-distribution for any of these models. Multiple genes with different parameters and models can be simulated simultaneously. MySSP is designed for large-scale studies, including simulation of multiple replicates and outputs sequences into NEXUS, MEGA, or FASTA formats. MySSP has a fairly simple GUI for basic use, but also has a specialized batch script interpreter to allow for more complicated or large-scale simulations. Where MySSP becomes unique relative to most other simulation programs is (1) its ability to simulate insertion and deletion events; (2) its ability to allow simulation of nonstationary processes and models across the tree; and (3) its option to output ancestral sequences. Two of these features (1 and 3) can individually be found in existing programs, but not all have been previously available in a single package. Each is described in turn.

Simulation of Insertions and Deletions

Insertions and deletions (indels) are a common component of sequent evolution, but historically have not been included in most simulation packages; only two are known to include indel evolution: ROSE (Stoye et al 1998) and DAWG (Cartwright 2005). MySSP simulates insertions and deletions using simple Poisson models for rate and size distribution of insertion and deletion events (modeled separately, parameters provided by the user). One advantage of MySSP is that the output sequences are aligned correctly, ie, the output sequences include gaps such that aligned sites across sequences represent true homologies. This gives one a baseline “true alignment” that can be used to contrast with the results from removing the gaps from the output sequences (a trivial exercise) and running them through a standard alignment program.

Non-stationary processes and models

A common concern in molecular sequence analysis is whether the evolutionary process is stationary across a tree. While there are many possible models of sequence evolution, the majority of simulation programs assume that whatever model is specified is constant throughout the tree. MySSP allows the user to change the evolutionary model for each and every branch, if they desire. One can completely change every aspect of the model, including basic substitution pattern (JC, HKY, etc.), transition-transversion bias, gamma distributed rate variation, equilibrium nucleotide frequencies, and indel rate and size. One can also change the basic rate of substitution for a branch, increasing or decreasing it relative to that found on the model tree. This flexibility allows one to much more easily examine the effects of non-stationary processes on bioinformatics analysis, eg, using a single “average” model in maximum likelihood phylogenetic analysis. The ability to completely change the model for each and every aspect of the tree is unique among simulation programs.

Ancestral sequences

MySSP also includes an option for outputting ancestral sequences, that is, the sequence found at each and every node on the tree. This may be useful for those wishing to test methods of ancestral state reconstruction or for whom tracing changes from ancestral sequences may be important. Ancestral sequence output is available from Evolver (Yang 1997) and Seq-Gen (Rambaut and Grassly 1997), but not in combination with indel and non-stationary simulation.

Availability

The program and documentation can be freely downloaded from http://lsweb.la.asu.edu/rosenberg. It runs natively under all 32-bit Windows operating systems and has also successfully been used under Linux emulators. Source code is available on request.
  15 in total

1.  Incomplete taxon sampling is not a problem for phylogenetic inference.

Authors:  M S Rosenberg; S Kumar
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-28       Impact factor: 11.205

2.  MCALIGN: stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution.

Authors:  Peter D Keightley; Toby Johnson
Journal:  Genome Res       Date:  2004-03       Impact factor: 9.043

3.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models.

Authors:  John Huelsenbeck; Bruce Rannala
Journal:  Syst Biol       Date:  2004-12       Impact factor: 15.683

4.  Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree.

Authors:  Sudhindra R Gadagkar; Michael S Rosenberg; Sudhir Kumar
Journal:  J Exp Zool B Mol Dev Evol       Date:  2005-01-15       Impact factor: 2.656

5.  Rose: generating sequence families.

Authors:  J Stoye; D Evers; F Meyer
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

6.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

Authors:  A Rambaut; N C Grassly
Journal:  Comput Appl Biosci       Date:  1997-06

Review 7.  Phylogenetic analysis in molecular evolutionary genetics.

Authors:  M Nei
Journal:  Annu Rev Genet       Date:  1996       Impact factor: 16.830

8.  A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.

Authors:  M Kimura
Journal:  J Mol Evol       Date:  1980-12       Impact factor: 2.395

9.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

Authors:  M Hasegawa; H Kishino; T Yano
Journal:  J Mol Evol       Date:  1985       Impact factor: 2.395

10.  Evolutionary distance estimation and fidelity of pair wise sequence alignment.

Authors:  Michael S Rosenberg
Journal:  BMC Bioinformatics       Date:  2005-04-19       Impact factor: 3.169

View more
  13 in total

1.  Simulation of molecular data under diverse evolutionary scenarios.

Authors:  Miguel Arenas
Journal:  PLoS Comput Biol       Date:  2012-05-31       Impact factor: 4.475

2.  SuiteMSA: visual tools for multiple sequence alignment comparison and molecular sequence simulation.

Authors:  Catherine L Anderson; Cory L Strope; Etsuko N Moriyama
Journal:  BMC Bioinformatics       Date:  2011-05-21       Impact factor: 3.169

3.  PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment.

Authors:  Botond Sipos; Tim Massingham; Gregory E Jordan; Nick Goldman
Journal:  BMC Bioinformatics       Date:  2011-04-19       Impact factor: 3.307

4.  Evaluation of methods for detecting conversion events in gene clusters.

Authors:  Giltae Song; Chih-Hao Hsu; Cathy Riemer; Webb Miller
Journal:  BMC Bioinformatics       Date:  2011-02-15       Impact factor: 3.169

5.  ALF--a simulation framework for genome evolution.

Authors:  Daniel A Dalquen; Maria Anisimova; Gaston H Gonnet; Christophe Dessimoz
Journal:  Mol Biol Evol       Date:  2011-12-08       Impact factor: 16.240

6.  Multiple sequence alignment accuracy and evolutionary distance estimation.

Authors:  Michael S Rosenberg
Journal:  BMC Bioinformatics       Date:  2005-11-23       Impact factor: 3.169

7.  The accuracy of several multiple sequence alignment programs for proteins.

Authors:  Paulo A S Nuin; Zhouzhi Wang; Elisabeth R M Tillier
Journal:  BMC Bioinformatics       Date:  2006-10-24       Impact factor: 3.169

8.  INDELible: a flexible simulator of biological sequence evolution.

Authors:  William Fletcher; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2009-05-07       Impact factor: 16.240

9.  Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0.

Authors:  Cory L Strope; Kevin Abel; Stephen D Scott; Etsuko N Moriyama
Journal:  Mol Biol Evol       Date:  2009-08-03       Impact factor: 16.240

10.  Trends in substitution models of molecular evolution.

Authors:  Miguel Arenas
Journal:  Front Genet       Date:  2015-10-26       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.