| Literature DB >> 28348834 |
Sebastian Duchêne1,2,3, Kathryn E Holt2,3, François-Xavier Weill4, Simon Le Hello4, Jane Hawkey2,3, David J Edwards2,3, Mathieu Fourment1, Edward C Holmes1.
Abstract
Estimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host-pathogen associations and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with 'ancient DNA' data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from approximately 10-5 to 10-8 nucleotide substitutions per site year-1. This variation was negatively associated with sampling time, with this relationship best described by an exponential decay curve. To avoid potential estimation biases, such time-dependency should be considered when inferring evolutionary time-scales in bacteria.Entities:
Keywords: bacteria; evolution; molecular clock; phylogeny; substitution rates; time-dependency
Mesh:
Year: 2016 PMID: 28348834 PMCID: PMC5320706 DOI: 10.1099/mgen.0.000094
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Regressions of the root-to-tip genetic distance (expected nucleotide substitutions per site) as a function of sampling time (year) for 35 bacterial data sets. Each point corresponds to an individual sampled genome (SNP sequence in the alignment), and the red dashed line is the linear regression using least squares; the R2 coefficients are also shown. Shading corresponds to the degree of temporal structure according to the date-randomization test in beast; blue indicates strong temporal structure, while orange and red indicate moderate and low temporal structure, respectively.
Fig. 2.Bayesian estimates of genome-scale nucleotide substitution rates for all bacterial data sets. The axis for the nucleotide substitution rate is shown on log10 scale. Circular points represent the mean rate estimate, and error bars correspond to the 95 % HPD values. Colours indicate the degree of temporal structure according to the date-randomization test as indicated in Fig. 1. For comparison, the x symbol represents the point estimates using regression, while the asterisk (*) corresponds to estimates that were negative, and are thus not shown.
Fig. 3.Estimates of genome-scale nucleotide substitution rates using a Bayesian method (beast) compared to those estimated via root-to-tip regression. The line represents y=x. Points that fall on the line correspond to data sets for which the mean Bayesian estimates closely match those from the regression. Points above and below the line are data sets for which the Bayesian estimate is higher or lower than that from the regression, respectively. The colour corresponds to the degree of temporal structure according to the date-randomization test as indicated in Fig. 1.
Fig. 4.Estimates of genome-wide nucleotide substitution rates in human-associated bacterial pathogens as a function of sampling time in years. The axes are shown on a log10 scale. The colour corresponds to the degree of temporal structure according to the date-randomization test; blue indicates strong temporal structure and orange indicates moderate temporal structure. The dashed line corresponds to the linear regression, while the solid line corresponds to the decay curve (both fitted using only the points with strong and moderate temporal structure). The grey lines represent 100 bootstrap replicates of the decay curve, and thus represent the uncertainty in the decay function.