| Literature DB >> 21504561 |
Botond Sipos1, Tim Massingham, Gregory E Jordan, Nick Goldman.
Abstract
BACKGROUND: The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity.Entities:
Mesh:
Year: 2011 PMID: 21504561 PMCID: PMC3102636 DOI: 10.1186/1471-2105-12-104
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Figure 1Illustration of the Gillespie algorithm. ① The rate at which the next event occurs is equal to the sum of the rates of all possible events; the time, t, until event k occurs is randomly chosen and the simulation ends if the event would have occurred after the end of the branch (L). ② The actual event that occurs is randomly selected, each event having probability proportional to its rate. The figure highlights the event k = 2, a G→A substitution at the third site of the evolving sequence. ③ The selected event is applied to the sequence, the set of possible events and their rates are updated and the next inter-event time (t3) drawn.
Figure 2Annotation of a simulated alignment by using PRANK's genomic structure model. A. A schematic representation of the structure of the genomic region used in the simulation. Noncoding regions, evolving by a K80 (Kimura two parameters) substitution process [32], are shown in green. Coding regions are shown in blue, and evolve by a GY94 (Goldman-Yang) codon model [33,34]. The other features included in the simulation, the fixed start codon and splicing sites and the stop codon evolving by a special substitution process, are shown in lighter shades. B. A "true" multiple sequence alignment resulting from the simulation of the genomic region along the phylogenetic tree shown to the left. The tracks under the sites represent the true intron-exon structure ("True state") and the annotation of the alignment inferred by PRANK alignment tool transferred to the human sequence ("Prank annot."). The thin portions of the PRANK annotation track indicate positions that have no annotation available as they have gaps in the human sequence in the true simulated alignment.
Comparison of some advanced alignment simulation tools
| Key | Feature* | Dawg v1.1.2 | MySSP v1.0 | Indel-Seq-Gen v1.0.3 | SIMPROT v1.01 | INDELible v1.0 | PhyloSim v0.12 |
|---|---|---|---|---|---|---|---|
| II | GTR | • | • | • | • | ||
| II | UNREST | • | • | ||||
| II | Empirical amino acid models | 3 | 3 | 15 | 11 | ||
| II | User defined amino acid models | • | • | ||||
| II | Codon models | • | • | ||||
| III | Combinations of substitution processes | • | |||||
| IV | Discrete gamma | • | • | ||||
| IV | Continuous gamma | • | • | • | • | • | |
| IV | Proportion of invariant sites | • | • | • | • | ||
| V | Complex rate variation | • | |||||
| VI | Multiple indel processes | • | |||||
| VII | Rate variation with indel processes | • | |||||
| VIII | Selective constraints on indels | • | |||||
| IX | Partitions | • | • | • | • | • | |
| X | Non-homogeneous evolution | • | • | • | |||
| XI | Full control over inserts | • |
Availability of complex evolutionary processes in different simulation software. Additional details for less advanced software and simpler models are given by [[12], Table 1].
*See text for details.