| Literature DB >> 35038728 |
Mahan Ghafari1, Louis du Plessis1, Jayna Raghwani1, Samir Bhatt2, Bo Xu3, Oliver G Pybus1, Aris Katzourakis1.
Abstract
High-throughput sequencing enables rapid genome sequencing during infectious disease outbreaks and provides an opportunity to quantify the evolutionary dynamics of pathogens in near real-time. One difficulty of undertaking evolutionary analyses over short timescales is the dependency of the inferred evolutionary parameters on the timespan of observation. Crucially, there are an increasing number of molecular clock analyses using external evolutionary rate priors to infer evolutionary parameters. However, it is not clear which rate prior is appropriate for a given time window of observation due to the time-dependent nature of evolutionary rate estimates. Here, we characterize the molecular evolutionary dynamics of SARS-CoV-2 and 2009 pandemic H1N1 (pH1N1) influenza during the first 12 months of their respective pandemics. We use Bayesian phylogenetic methods to estimate the dates of emergence, evolutionary rates, and growth rates of SARS-CoV-2 and pH1N1 over time and investigate how varying sampling window and data set sizes affect the accuracy of parameter estimation. We further use a generalized McDonald-Kreitman test to estimate the number of segregating nonneutral sites over time. We find that the inferred evolutionary parameters for both pandemics are time dependent, and that the inferred rates of SARS-CoV-2 and pH1N1 decline by ∼50% and ∼100%, respectively, over the course of 1 year. After at least 4 months since the start of sequence sampling, inferred growth rates and emergence dates remain relatively stable and can be inferred reliably using a logistic growth coalescent model. We show that the time dependency of the mean substitution rate is due to elevated substitution rates at terminal branches which are 2-4 times higher than those of internal branches for both viruses. The elevated rate at terminal branches is strongly correlated with an increasing number of segregating nonneutral sites, demonstrating the role of purifying selection in generating the time dependency of evolutionary parameters during pandemics.Entities:
Keywords: clock rate; molecular clock; purifying selection; substitution rate
Mesh:
Year: 2022 PMID: 35038728 PMCID: PMC8826518 DOI: 10.1093/molbev/msac009
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.(A, B) Inferred rates, (C, D) times of origin, and (E, F) growth rates of SARS-CoV-2 (left column) and pH1N1 influenza (right column) using an exponential (red) and a logistic (black) growth model. Open circles represent nonconvergence for at least one parameter in the Bayesian analysis. Note that y-axes are not on the same scale for SARS-CoV-2 and pH1N1.
Log-Marginal Likelihoods of Exponential and Logistic Growth Models with Increasing Temporal Ranges of Sampling Dates.
| Month of Sampling | Number of Samples | Log-Marginal Likelihood Exponential Growth Model | Log-Marginal Likelihood Logistic Growth Model | Bayes Factor |
|---|---|---|---|---|
| Jan | 41 | −41,022.77 | −41,025.56 | −2.79 |
| Feb | 121 | −43,062.00 | −43,047.11 | +14.88 |
| Mar | 181 | −44,627.77 | −44,607.95 | +19.78 |
| Apr | 241 | −46,524.75 | −46,506.25 | +18.53 |
| May | 301 | −48,587.39 | −48,548.15 | +39.24 |
| Jun | 361 | −52,310.61 | −52,248.23 | +62.38 |
| Jul | 421 | −55,138.43 | −55,036.60 | +101.83 |
| Aug | 481 | −58,580.94 | −58,430.09 | +150.85 |
| Sep | 541 | −62,101.22 | −61,805.93 | +295.29 |
| Oct | 601 | −65,433.20 | −65,005.09 | +428.11 |
| Nov | 661 | −69,121.03 | −68,921.98 | +199.05 |
| Dec | 721 | −73,094.26 | −72,389.69 | +704.57 |
Note.—Taking exponential growth as the null model, we select the logistic growth model for any data set with a positive Bayes factor.
Fig. 2.(A, B) Number of nonneutral sites over time for SARS-CoV-2 (left column) and pH1N1 (right column). (C, D) Number of replacement (dashed blue line), silent (solid blue line) sites, and their ratio (red line) over time. (E, F) Mean clock rate (black), and the rates at the terminal (red) and internal (blue) branches. The MCMC chains for the first month of sampling SARS-CoV-2 and the first three months of pH1N1 do not converge using the logistic growth coalescent model. Instead, the exponential growth coalescent model was used. (G, H) Correlation coefficient between the substitution rate at terminal branches and number of nonneutral sites—excluding the estimates from the first month of sampling due to inadequate temporal signal and significant uncertainty in the inferred parameters.