| Literature DB >> 25577191 |
Miguel Arenas1, Joao S Lopes2, Mark A Beaumont3, David Posada4.
Abstract
The estimation of substitution and recombination rates can provide important insights into the molecular evolution of protein-coding sequences. Here, we present a new computational framework, called "CodABC," to jointly estimate recombination, substitution and synonymous and nonsynonymous rates from coding data. CodABC uses approximate Bayesian computation with and without regression adjustment and implements a variety of codon models, intracodon recombination, and longitudinal sampling. CodABC can provide accurate joint parameter estimates from recombining coding sequences, often outperforming maximum-likelihood methods based on more approximate models. In addition, CodABC allows for the inclusion of several nuisance parameters such as those representing codon frequencies, transition matrices, heterogeneity across sites or invariable sites. CodABC is freely available from http://code.google.com/p/codabc/, includes a GUI, extensive documentation and ready-to-use examples, and can run in parallel on multicore machines.Entities:
Keywords: approximate Bayesian computation; coding data; molecular adaptation; recombination; substitution rate
Mesh:
Year: 2015 PMID: 25577191 PMCID: PMC4379410 DOI: 10.1093/molbev/msu411
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FAccuracy of CodABC using simulated data. For each combination of ρ, θ, and ω, we present the corresponding estimates for ρ (top), ω (middle), and θ (down). Dashed lines indicate the true value. Points present the mode of the prior distributions and error bars indicate the 95% CI.
FCodABC computing times. The simulated data contain 15 sequences with 900 nucleotides. The first real data set contains 22 sequences with 864 nucleotides. The second real data set contains 20 sequences with 894 nucleotides. The third real data set is the biggest and contains 55 sequences with 1,449 nucleotides. Prior distributions: ρ: U(0,50), θ: U(0,300), and ω: U(0,2). The analyses were run on an Intel Xeon CPU 2.33 GHz with 24 cores.