MOTIVATION: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates (dN/dS) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in dN/dS across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific dN/dS estimates. RESULTS: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific dN/dS values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific dN/dS estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples.
MOTIVATION: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates (dN/dS) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in dN/dS across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific dN/dS estimates. RESULTS: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific dN/dS values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific dN/dS estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples.
Authors: Daniel L Ayres; Aaron Darling; Derrick J Zwickl; Peter Beerli; Mark T Holder; Paul O Lewis; John P Huelsenbeck; Fredrik Ronquist; David L Swofford; Michael P Cummings; Andrew Rambaut; Marc A Suchard Journal: Syst Biol Date: 2011-10-01 Impact factor: 15.683
Authors: Lydia Tan; Frank E J Coenjaerts; Lieselot Houspie; Marco C Viveen; Grada M van Bleek; Emmanuel J H J Wiertz; Darren P Martin; Philippe Lemey Journal: J Virol Date: 2013-05-22 Impact factor: 5.103
Authors: Patrick T Dolan; Andrew P Roth; Bin Xue; Ren Sun; A Keith Dunker; Vladimir N Uversky; Douglas J LaCount Journal: Protein Sci Date: 2014-12-31 Impact factor: 6.725
Authors: Martha I Nelson; Marie R Culhane; Nídia S Trovão; Devi P Patnayak; Rebecca A Halpin; Xudong Lin; Meghan H Shilts; Suman R Das; Susan E Detmer Journal: J Gen Virol Date: 2017-10-23 Impact factor: 3.891