| Literature DB >> 26397960 |
Stephanie J Spielman1, Claus O Wilke1.
Abstract
We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.Entities:
Mesh:
Year: 2015 PMID: 26397960 PMCID: PMC4580465 DOI: 10.1371/journal.pone.0139047
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Substitution models included in Pyvolve.
|
|
|
|---|---|
| Nucleotide | GTR [ |
| Amino acid | JTT [ |
| Mechanistic codon | GY-style [ |
| Empirical codon | ECM (restricted and unrestricted) [ |
| Mutation-selection | Halpern-Bruno model [ |
Fig 1Example code for a simple codon simulation in Pyvolve.
This example will simulate an alignment of 100 codons with a dN/dS of 0.75 and a κ (transition-tranversion mutational bias) of 3.25. By default, sequences will be output to a file called “simulated_alignment.fasta”, although this file name can be changed, as described in Pyvolve’s user manual.
Fig 2Pyvolve accurately evolves sequences under homogenous, site-wise rate heterogeneity, and branch-specific rate heterogeneity.
A) Nucleotide alignments simulated under the JC69 [36] model along two-taxon trees with varying branch lengths, which represent the substitution rate. Points represent the mean observed substitution rate for the 50 alignment replicates simulated under the given value, and error bars represent standard deviations. The red line indicates the x = y line. B) Codon alignments simulated under an MG94-style [32] model with varying values for the dN/dS parameter. Points represent the mean dN/dS inferred from the 50 alignment replicates simulated under the given dN/dS value, and error bars represent standard deviations. The red line indicates the x = y line. C) Site-wise heterogeneity simulated with an MG94-style [32] model with varying dN/dS values across sites. Horizontal lines indicate the simulated dN/dS value for each dN/dS category. D) Branch-wise heterogeneity simulated with an MG94-style [32] model with each branch evolving according to a distinct dN/dS value. Horizontal lines indicate the simulated dN/dS value for each branch, as shown in the inset phylogeny. The lowest dN/dS category (dN/dS = 0.1) was applied to the internal branch (shown in gray). All code and data used to validate Pyvolve’s performance and generate this figure are available in S1 File.