| Literature DB >> 35038741 |
John H Tay1, Ashleigh F Porter1, Wytamma Wirth1, Sebastian Duchene1.
Abstract
The ongoing SARS-CoV-2 pandemic has seen an unprecedented amount of rapidly generated genome data. These data have revealed the emergence of lineages with mutations associated to transmissibility and antigenicity, known as variants of concern (VOCs). A striking aspect of VOCs is that many of them involve an unusually large number of defining mutations. Current phylogenetic estimates of the substitution rate of SARS-CoV-2 suggest that its genome accrues around two mutations per month. However, VOCs can have 15 or more defining mutations and it is hypothesized that they emerged over the course of a few months, implying that they must have evolved faster for a period of time. We analyzed genome sequence data from the GISAID database to assess whether the emergence of VOCs can be attributed to changes in the substitution rate of the virus and whether this pattern can be detected at a phylogenetic level using genome data. We fit a range of molecular clock models and assessed their statistical performance. Our analyses indicate that the emergence of VOCs is driven by an episodic increase in the substitution rate of around 4-fold the background phylogenetic rate estimate that may have lasted several weeks or months. These results underscore the importance of monitoring the molecular evolution of the virus as a means of understanding the circumstances under which VOCs may emerge.Entities:
Keywords: Bayesian model selection; SARS-CoV-2 molecular evolution; molecular clock; variants of concern
Mesh:
Substances:
Year: 2022 PMID: 35038741 PMCID: PMC8807201 DOI: 10.1093/molbev/msac013
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Model Selection Results for Complete SARS-CoV-2 Genomes.
| Model | ps logML | ss logML | ps Rank | ss Rank | ps BF | ss BF |
|---|---|---|---|---|---|---|
| FLC shared stems | −55430.85 | −55431.49 | 1 | 1 | 0 | 0 |
| UCG | −55433.15 | −55433.41 | 2 | 2 | −2.30 | −1.92 |
| FLC stems | −55434.05 | −55434.51 | 3 | 3 | −3.2 | −3.02 |
| UCLN | −55435.83 | −55435.81 | 4 | 4 | −4.98 | −4.32 |
| SC | −55444.05 | −55444.59 | 5 | 5 | −13.20 | −13.10 |
| FLC shared clades + stems | −55444.77 | −55445.31 | 6 | 6 | −13.92 | −13.82 |
| FLC shared clades | −55449.82 | −55450.29 | 7 | 7 | −18.97 | −18.80 |
| FLC clades + stems | −55453.62 | −55454.09 | 8 | 8 | −22.77 | −22.60 |
| FLC clades | −55458.51 | −55459.06 | 9 | 9 | −27.66 | −27.57 |
Note.—Mean estimates of log marginal likelihoods using path sampling and stepping-stone (ps logML and ss logML, respectively), over ten replicates. Log BFs are shown for the best-fitting model, relative to all others (increasingly negative numbers mean lower statistical fit), and thus they are 0.0 for the top model.
Fig. 1.Calculations of log marginal likelihoods for all molecular clock models using path sampling and stepping-stone. The hollow circles represent individual estimates, with ten replicates per model, and solid circles denote the mean value over replicates. The vertical lines represent the range of values in each case. The horizontal dashed line corresponds to a log BF of 1.1 (“substantial evidence”) relative to the mean log marginal likelihood of the best model (FLC shared stems), whereas the dotted line is the same value relative to the lowest log marginal likelihood of the best model.
Fig. 2.Violin plots for posterior statistics of FLC. (A) is for a model where the stem branches of VOCs share a substitution rate that is different to that of the background (model “FLC shared stems” in supplementary table S1 and fig. S1, Supplementary Material online). The substitution rate for VOCs stem branches is shown in orange and the background in gray. The dashed line represents the mean background rate and the dotted lines are the 95% credible interval. (B) is the ratio of the substitution rate for VOC stem branches and the background under the same model and the dashed line represents a value of 1.0 where the background and VOC stem rate would be the same. (C) and (D) show the corresponding statistics for the FLC stems model, where the stem branch of every VOC has a different rate. Abbreviation “B” stands for background.
Fig. 3.Violin plots of posterior statistics for the uncorrelated relaxed clocks with lognormal (UCLN) and gamma (UCG) distributions (see Supplementary Material online). The top row, (A) through (C), is for the UCLN and the bottom row, (D) through (F), is for the UCG. (A) and (D) show the coefficient of rate variation, which is the standard deviation of branch rates divided by the mean rate, and indicates clock-like behavior when it is abutting zero (Drummond ; Ho ). In (B) and (E), the substitution rate is shown for the stem branches of VOCs and for the mean of background branches (i.e., those that are not the stems of VOCs), abbreviated as “B.” The dashed line denotes the mean background rate, whereas the dotted lines represent the upper and lower 95% credible interval. (C) and (F) show the percentile in which stem branches for VOCs fall with respect to other branches. Note that the densities have been smoothed, but the maximum values are 100.