Literature DB >> 35038741

The Emergence of SARS-CoV-2 Variants of Concern Is Driven by Acceleration of the Substitution Rate.

John H Tay1, Ashleigh F Porter1, Wytamma Wirth1, Sebastian Duchene1.   

Abstract

The ongoing SARS-CoV-2 pandemic has seen an unprecedented amount of rapidly generated genome data. These data have revealed the emergence of lineages with mutations associated to transmissibility and antigenicity, known as variants of concern (VOCs). A striking aspect of VOCs is that many of them involve an unusually large number of defining mutations. Current phylogenetic estimates of the substitution rate of SARS-CoV-2 suggest that its genome accrues around two mutations per month. However, VOCs can have 15 or more defining mutations and it is hypothesized that they emerged over the course of a few months, implying that they must have evolved faster for a period of time. We analyzed genome sequence data from the GISAID database to assess whether the emergence of VOCs can be attributed to changes in the substitution rate of the virus and whether this pattern can be detected at a phylogenetic level using genome data. We fit a range of molecular clock models and assessed their statistical performance. Our analyses indicate that the emergence of VOCs is driven by an episodic increase in the substitution rate of around 4-fold the background phylogenetic rate estimate that may have lasted several weeks or months. These results underscore the importance of monitoring the molecular evolution of the virus as a means of understanding the circumstances under which VOCs may emerge.
© The Author(s) 2022. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Bayesian model selection; SARS-CoV-2 molecular evolution; molecular clock; variants of concern

Mesh:

Substances:

Year:  2022        PMID: 35038741      PMCID: PMC8807201          DOI: 10.1093/molbev/msac013

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

The Molecular Clock of SARS-CoV-2

Genome sequence data of viruses have been extensively used to track the evolution and spread of these pathogens. The ongoing SARS-CoV-2 pandemic has seen an unprecedented number of genomes generated that have been used to gain rapid insight to understand epidemiological spread (Dellicour ), identify the time of origin (Pekar ), and track mutations of functional importance. Most concerning mutations occur in the spike protein and may increase transmissibility (Kraemer ), or disease severity (Harvey ), although vaccines are likely still effective against them (Dearlove ). Such lineages are known as variants of concern (VOCs) and they are characterized at a genomic level by a number of fixed mutations in the S1 subunit of the spike protein, the most common of which are mutations N501Y and D614G (Eurosurveillance Editorial Team 2021), with the latter presenting evidence of increased transmissibilty and favored by selection (Martin ; Volz ). For a lineage to be formally denominated as a VOC, there must be evidence of an impact in transmissibility, virulence, and/or immunity (Mascola ). SARS-CoV-2 lineages are classified using a dynamic nomenclature system, known as PANGO (Rambaut ). Recently the World Health Organization assigned VOCs letters of the Greek alphabet (Konings ). In October 2021, the United States CDC recognizes four VOCs: Alpha (PANGO lineage B.1.1.7) first identified in the United Kingdom, Beta (PANGO lineage B.1.351) first identified in South Africa, Gamma (PANGO lineage P.1) first identified in Brazil, and Delta (PANGO lineage B.1.617.2) first identified in India (CDC 2021). As of December 2021, a new VOC has been detected and has rapidly spread globally (Viana ), Omicron (PANGO lineages BA.1 and BA.2), not included in this study. The mechanisms under which VOCs have emerged is not entirely clear, but their defining mutations are well characterized and their fixation has been attributed to the action of natural selection (Martin ). Variant Alpha has 14 protein-altering mutations and three deletions, with eight of these being in the spike protein. One of the deletions ΔH69/ΔV70 enhances infectivity in vitro and has been detected in immunocompromised patients where immune escape occurred (Kemp ; Plante ). Variant Beta has nine protein-altering mutations with five altering the receptor binding domain. (Tegally ). Variant Gamma has 17 mutations, with 10 found in the spike protein and including N501Y and E484K (Faria ). Alpha, Beta, and Gamma share several important mutations, including N501Y and E404K, which likely enhance affinity to human the ACE2 receptor (Nelson ). Variant Delta is characterized by seven mutations in the spike protein, several of which have been associated with altered immune response and increased viral replication, viral load, and likely leading to increased viral fitness (CDC 2021). The sheer number of mutations observed in these four VOCs is much higher than what would be expected under phylogenetic estimates of the nucleotide evolutionary rate of SARS-CoV-2, which range from around 7 × 10−4 to 1.1 × 10−3 subs/site/year (Duchene ; Ghafari ), meaning that only about two mutations along the genome would accumulate per month along a lineage. In these circumstances, the 14 mutations in Alpha would require a period of at least 6 months, a time that is inconsistent with its first detection in September 2020, because it would have had to evolve from around March 2020 with most defining mutations undetected for many months. We investigated whether the emergence of VOCs is associated with an increase in the evolutionary rate that can be detected using phylogenetic analyses of genome data and in the absence of dense intrahost or transmission chain sampling. The term “evolutionary rate” refers to the amount of molecular change that can be measured using a phylogenetic method and is thus the result of the instantaneous mutation rate and the substitution rate (i.e. the rate at which such mutations become fixed) (Ho ). The latter is largely determined by the action of natural selection, which is a probable cause for the large number of mutations in VOCs (Martin ). Thus, here we use the term “substitution rate”, which reflects our estimates more closely. We analyzed publicly available nucleotide sequence data from GISAID (Elbe and Buckland-Merrett 2017; Shu and McCauley 2017) under a range of molecular clock models that describe the substitution rate along branches in phylogenetic trees, shown in the Supplementary Material online. We consider each model as a hypothesis for which we can assess statistical support using Bayesian model selection techniques. Critically, our analyses do not intend to detect signatures of natural selection, nor to identify genomic regions with higher mutation rates, which have been described elsewhere (Abdool Karim and de Oliveira 2021; Harvey ; Martin ). Instead, our framework serves to characterize the main patterns of substitution rate variation in the genome of the virus that underpin the emergence of VOCs. The simplest molecular clock model is known as the strict molecular clock (SC; Zuckerkandl and Pauling 1962, 1965) that posits a single substitution rate for all branches in a phylogenetic tree, and thus serves as a “null” model. A more complex model is the uncorrelated relaxed clock that assumes that branch rates are independent and identically distributed draws from a statistical distribution (Drummond ), for which we considered either a lognormal or a Γ distribution (UCLN and UCG, respectively). We also considered a range of fixed local clock (FLC) models (Yoder and Yang 2000). These models require an a priori definition of a set of “background” branches and a set of branches with different rates, known as “foreground.” For example, foreground branches can be defined based on some biological expectation (e.g., Worobey ) and represent a formal evolutionary hypothesis. The substitution rate is constant for a given group of branches, although there exist approaches where branches can be assigned a set of relaxed molecular clocks (Fourment and Darling 2018). These models differ in their number of parameters and biological assumptions (supplementary table S1, Supplementary Material online; reviewed in Bromham and Ho and Duchêne [2014]). We specified six configurations of the FLC model, where the substitution rate could vary within VOC clades (FLC clades model in supplementary fig. S1, Supplementary Material online) or along the stem (FLC stems + clades), only at stem branches (FLC stems), or where these rates could be shared among all VOCs (FLC shared stems, FLC shared clades, and FLC shared clades + stems in supplementary fig. S1, Supplementary Material online). Models in which the rate only changes along the stem branches of VOCs represent a situation where the substitution rate may increase for a short period of time before returning to the background rate. In contrast, models where the clade also undergoes a rate change would imply that VOCs have a rate that is statistically different from the background. An alternative approach to the FLC is the random local clock (RLC; Drummond and Suchard 2010). The substitution rate can change at particular nodes in the tree and the location of such changes and actual rates are inferred. The RLC is a general form of all local clock models, where the simplest form is the SC, as a case of no rate changes (Ho and Duchêne 2014; Bromham ).

Bayesian Hypothesis Testing

We conducted Bayesian model testing by calculating the log marginal likelihood, a measure of statistical fit, and ranking the models accordingly. The difference in log marginal likelihoods between two models is known as the log Bayes factor (BF; Sinsheimer ) and measures the relative support for two models given the data. In general, a log BF of at least 1.1 is considered as “substantial evidence” in favor of a model, with 2.3 being “strong” and 4.6 “decisive” (Kass and Raftery 1995). We considered two marginal likelihood estimators, path sampling and stepping-stone sampling (Gelman and Meng 1998; Lartillot and Philippe 2006; Xie ).

Results

Model Selection

The FLC-shared stems model had the highest statistical fit, with a log BF of at least 1.92 compared with the next best-fitting model (2.30 with path sampling and 1.92 with stepping-stone sampling; table 1). The next model with highest mean log marginal likelihood was the UCG, followed by the FLC stems and UCLN. Note, however, that there is some overlap in replicate log marginal likelihoods for these four models (fig. 1). The top two FLC models assume that the stem branches of VOC have a rate that differs from the background and they only differ in that the FLC stems model allows each VOC stem branch to have its own rate.
Table 1.

Model Selection Results for Complete SARS-CoV-2 Genomes.

Modelps logMLss logMLps Rankss Rankps BFss BF
FLC shared stems−55430.85−55431.491100
UCG−55433.15−55433.4122−2.30−1.92
FLC stems−55434.05−55434.5133−3.2−3.02
UCLN−55435.83−55435.8144−4.98−4.32
SC−55444.05−55444.5955−13.20−13.10
FLC shared clades + stems−55444.77−55445.3166−13.92−13.82
FLC shared clades−55449.82−55450.2977−18.97−18.80
FLC clades + stems−55453.62−55454.0988−22.77−22.60
FLC clades−55458.51−55459.0699−27.66−27.57

Note.—Mean estimates of log marginal likelihoods using path sampling and stepping-stone (ps logML and ss logML, respectively), over ten replicates. Log BFs are shown for the best-fitting model, relative to all others (increasingly negative numbers mean lower statistical fit), and thus they are 0.0 for the top model.

Fig. 1.

Calculations of log marginal likelihoods for all molecular clock models using path sampling and stepping-stone. The hollow circles represent individual estimates, with ten replicates per model, and solid circles denote the mean value over replicates. The vertical lines represent the range of values in each case. The horizontal dashed line corresponds to a log BF of 1.1 (“substantial evidence”) relative to the mean log marginal likelihood of the best model (FLC shared stems), whereas the dotted line is the same value relative to the lowest log marginal likelihood of the best model.

Calculations of log marginal likelihoods for all molecular clock models using path sampling and stepping-stone. The hollow circles represent individual estimates, with ten replicates per model, and solid circles denote the mean value over replicates. The vertical lines represent the range of values in each case. The horizontal dashed line corresponds to a log BF of 1.1 (“substantial evidence”) relative to the mean log marginal likelihood of the best model (FLC shared stems), whereas the dotted line is the same value relative to the lowest log marginal likelihood of the best model. Model Selection Results for Complete SARS-CoV-2 Genomes. Note.—Mean estimates of log marginal likelihoods using path sampling and stepping-stone (ps logML and ss logML, respectively), over ten replicates. Log BFs are shown for the best-fitting model, relative to all others (increasingly negative numbers mean lower statistical fit), and thus they are 0.0 for the top model. The uncorrelated relaxed clocks had very similar performance, although both had at least “substantial evidence” against them with respect to the FLC-shared stems model (i.e. log BF > −1.1). The log BFs for the remaining models were at least −13, implying “decisive” evidence against them, relative to the FLC-shared stems. Interestingly, FLC models where VOC clades were defined as foreground had decisively lower statistical performance than those where only stem branches were labeled as foreground (table 1, fig. 1). In fact, even the SC model, which is generally considered unrealistic for empirical data, had a log BF of at least 4 with respect to FLC shared clades and the FLC clades + stems (table 1).

Rates of Evolution of Variants of Concern

The FLC shared stems model had a mean background substitution rate of 0.58 × 10−3 subs/site/year (95% CI: 0.51–0.65 × 10−3), whereas that for the VOC stems was 2.45 × 10−3 subs/site/year (95% CI: 1.15–4.72 × 10−3). The corresponding mean values in units of substitutions per month across the entire genome (subs/month) are 1.24 for the background and 6.11 for the foreground. As such, the VOC stems rate was around 4-fold higher than the background (mean 4.25, 95% CI: 2.61–8.19) (fig. 2).
Fig. 2.

Violin plots for posterior statistics of FLC. (A) is for a model where the stem branches of VOCs share a substitution rate that is different to that of the background (model “FLC shared stems” in supplementary table S1 and fig. S1, Supplementary Material online). The substitution rate for VOCs stem branches is shown in orange and the background in gray. The dashed line represents the mean background rate and the dotted lines are the 95% credible interval. (B) is the ratio of the substitution rate for VOC stem branches and the background under the same model and the dashed line represents a value of 1.0 where the background and VOC stem rate would be the same. (C) and (D) show the corresponding statistics for the FLC stems model, where the stem branch of every VOC has a different rate. Abbreviation “B” stands for background.

Violin plots for posterior statistics of FLC. (A) is for a model where the stem branches of VOCs share a substitution rate that is different to that of the background (model “FLC shared stems” in supplementary table S1 and fig. S1, Supplementary Material online). The substitution rate for VOCs stem branches is shown in orange and the background in gray. The dashed line represents the mean background rate and the dotted lines are the 95% credible interval. (B) is the ratio of the substitution rate for VOC stem branches and the background under the same model and the dashed line represents a value of 1.0 where the background and VOC stem rate would be the same. (C) and (D) show the corresponding statistics for the FLC stems model, where the stem branch of every VOC has a different rate. Abbreviation “B” stands for background. Although the FLC stems model that assigned each VOC stem branch a different rate had very high uncertainty, it also suggested much higher rates for these branches. The mean background rate under this model was 0.55 × 10−3 subs/site/year (95% CI: 0.49–0.62 × 10−3). The corresponding values for VOC were 8.47 × 10−3 subs/site/year (95% CI: 1.93–82.37 × 10−3) for Alpha, 1.71 × 10−3 (95% CI: 0.34–33.20 × 10−3) for Beta, 2.76 × 10−3 (95% CI: 1.21–13.23 × 10−3) for Gamma, and 1.54 × 10−3 (95% CI: 0.62–7.35 × 10−3) for Delta. Clearly, these estimates were several fold higher than that of the background branches, and in spite of their high uncertainty least 0.90 of the posterior density was higher than the mean background rate (fig. 2). A key consideration is that the high uncertainty in the FLC stems model means that the actual values of rate estimates for VOC stem branches should be interpreted cautiously. The prior on all clock rates here is a continuous-time Markov chain (CTMC) reference prior, which consists of a Γ distribution with α = 0.5 and β = T, where T is the tree length (Ferreira and Suchard 2008; Wang and Yang 2014). Because the mean of a Γ distribution is α/β, under this prior the expectation is that the average substitution rate is 0.5/T. Our estimate of T under this model had a mean of 85.6 (95% CI: 78.37–92.98), which results in a relatively wide distribution with an expected mean of around 5 × 10−3 subs/site/year. A comparison of this prior, the posterior for VOC stem branch rates, and the background rate illustrates that VOC branch rates deviate much less from the prior than the background rate does. Thus, VOC branch rate estimates under this model may be sensitive to the choice of prior (see supplementary figs. S2 and S3, Supplementary Material online). That is, the data may not be sufficiently informative to produce meaningful estimates of these parameters under this model. The coefficient of rate variation for both relaxed clock models, UCG and UCLN, was indicative of departure from clocklike evolution in the data. To investigate whether VOC stem branch rates differed from the rest, we extracted individual branch rates and compared the VOC stem branch rates to the mean of all other branches. We found evidence that VOC stem branch rates were higher than the mean of other branches, with higher means values, but with very high uncertainty and 95% credible intervals that overlapped with the mean of other branches (fig. 3).
Fig. 3.

Violin plots of posterior statistics for the uncorrelated relaxed clocks with lognormal (UCLN) and gamma (UCG) distributions (see Supplementary Material online). The top row, (A) through (C), is for the UCLN and the bottom row, (D) through (F), is for the UCG. (A) and (D) show the coefficient of rate variation, which is the standard deviation of branch rates divided by the mean rate, and indicates clock-like behavior when it is abutting zero (Drummond ; Ho ). In (B) and (E), the substitution rate is shown for the stem branches of VOCs and for the mean of background branches (i.e., those that are not the stems of VOCs), abbreviated as “B.” The dashed line denotes the mean background rate, whereas the dotted lines represent the upper and lower 95% credible interval. (C) and (F) show the percentile in which stem branches for VOCs fall with respect to other branches. Note that the densities have been smoothed, but the maximum values are 100.

Violin plots of posterior statistics for the uncorrelated relaxed clocks with lognormal (UCLN) and gamma (UCG) distributions (see Supplementary Material online). The top row, (A) through (C), is for the UCLN and the bottom row, (D) through (F), is for the UCG. (A) and (D) show the coefficient of rate variation, which is the standard deviation of branch rates divided by the mean rate, and indicates clock-like behavior when it is abutting zero (Drummond ; Ho ). In (B) and (E), the substitution rate is shown for the stem branches of VOCs and for the mean of background branches (i.e., those that are not the stems of VOCs), abbreviated as “B.” The dashed line denotes the mean background rate, whereas the dotted lines represent the upper and lower 95% credible interval. (C) and (F) show the percentile in which stem branches for VOCs fall with respect to other branches. Note that the densities have been smoothed, but the maximum values are 100. The mean substitution rate of branches other than the VOC stems was 0.65 × 10−3 subs/site/year (95% CI: 0.58–0.77 × 10−3) in the UCLN and 0.69 × 10−3 subs/site/year (95% CI: 0.60–0.80 × 10−3) for the UCG. In contrast, the VOC stem mean substitution rates for the UCLN were: 1.29 × 10−3 subs/site/year (95% credible interval, CI: 0.76–2.56 × 10−3) for Alpha, 0.64 × 10−3 (95% CI: 0.32–1.57 × 10−3) for Beta, 1.29 × 10−3 (0.82–2.40 × 10−3) for Gamma, and 1.06 × 10−3 (95% CI: 0.50–2.38 × 10−3) for Delta, and with comparable values for the UCG. The quantile where VOC stems rates fell with respect to other branches also supported the finding that their rates were particularly high in most cases. In the UCLN, for Alpha 0.96 of posterior density had the stem rate in the top 75% of fastest evolving branches, with the corresponding numbers for the other VOCs being 0.25, 0.98, and 0.81 Beta, Gamma, and Delta, respectively, and with comparable values in the UCG (0.92, 0.45, 0.96, and 0.91; fig. 3). The RLC model produced less clear results than the other molecular clock models. The maximum a posteriori number of rate changes was 4, with the 95% CI ranging between 2 and 5. However, the posterior probability of rate changes in VOC stem branches or clades was 0.0. Instead, rate changes were not consistently found on particular branches. It is conceivable that this model poses a heavy penalty on rate changes. In particular, there is a very large number of local clock configurations in these data, which may be impossible to visit under a standard Markov chain Monte Carlo analyses and may result in low statistical power to assess support for our hypotheses. This model had a substitution rate estimate that was comparable with that of other models (mean 0.60 × 10−3 subs/site/year; 95% CI: 0.49–0.72 × 10−3).

Emergence Time and Expected Genome Substitutions

We estimated the duration of time along VOC stem branches and the inferred total number of nucleotide substitutions along the complete genome. We focus on the best fitting model (FLC shared stems), with similar results for other models (supplementary fig. S4, Supplementary Material online). The duration of time along these branches represents the time required before VOCs started to diversify, but it is important to note that they are contingent on sampling bias, and could therefore be shorter than estimated here. Under the FLC-shared stems model, the stem branch leading up to VOs were: 14 weeks (95% CI: 6–24) for Alpha, 4 (95% CI: 2–8) for Beta, 17 (95% CI: 8–28) for Gamma, and 6 (3–11) for Delta (supplementary fig. S4, Supplementary Material online). The expected number of substitutions along the complete genome were: 21 (95% CI: 14–32) for Alpha, 6 (95% CI: 3–11) for Beta, 26 (95% CI: 18–35) for Gamma, and 9 (95% CI: 6–16) for Delta. Although, these numbers are loosely associated with the number of defining mutations, they are not directly comparable because they involve substitutions along the entire genome and they correspond to the inference from a standard phylogenetic substitution model (the GTR + Γ in this case).

Discussion

Our mean rate estimates over all lineages are somewhat lower than earlier estimates (Duchene ), which is consistent with the notion that the virus has had time to evolve and to remove transient deleterious mutations since its emergence (Ghafari ). Clearly, the molecular substitution rate of SARS-CoV-2 displays substantial variation among lineages, a pattern that has been apparent since early phylogenetic analyses of the virus (Duchene ). Substitution rate variation is sometimes stochastic in nature and pinpointing its causes is often difficult in empirical data. Our explicit hypothesis testing framework suggests that the emergence of VOCs explains much of the substitution rate variation in the virus. This model testing framework has been previously used to understand viral evolution among host species in influenza (Worobey ), and the host range SARS-CoV-2 and closely related viruses (MacLean ). Here we used marginal likelihood estimators that have shown high accuracy (Fourment ), but recent developments, including those based in sequential Monte Carlo (Wang ), may improve statistical power for differentiating clock models. We suggest that model testing may be preferable to using highly parametric models, such as relaxed molecular clock models, for this purpose because they tend to have very high variance in substitution rates of particular branches. Recent advances in fitting relaxed and RLC models may provide increased sensitivity and precision in branch specific rate estimates (Douglas ; Fisher ). We find compelling evidence that episodic, instead of long term, increases in the substitution rate underpin the emergence of VOCs, a process that is probably driven the action of natural selection. All models where VOC clades were assigned a different rate to the background had poor statistical fit, even when compared with the SC “null” model, providing further support for such rate increases to occur over a short period of time. The increase in substitution rate required to give rise to the four VOCs examined was estimated to be around 4-fold compared with the background, although such estimates may carry high uncertainty when estimated for individual stem branches. Under these circumstances, the number of mutations required to give rise to a VOC, such as Alpha, would have accumulated in about 3 months, with some variants requiring a few weeks, such as Beta and Delta. These timescales appear plausible in chronic infections of SARS-CoV-2 (Harvey ; Kemp ), but other circumstances are also likely, for example, if transmission is infrequent and selection favors mutations that increase transmissibility between hosts. Our genomic analyses demonstrate that signatures of increased substitution rates are detectable using phylogenetic methods and genome surveillance data. A recent study of reported increased evolutionary rates within sublineages of Gamma (Gräf ), which prompts further investigation of within lineage evolution. However, the precise mechanism (ecological or intrahost) of how VOCs have emerged is still unclear. Elucidating these processes will require dense sampling between transmission chains, specifically in settings where transmission is unlikely and intra-host sequence data are available. Another important area that is currently under intense investigation is how natural selection shapes the emergence and persistence of VOCs (Martin ; Tegally ). Such studies may benefit from using explicit models where the substitution rate is treated as a function of environmental or ecological variables (Streicker ). We recommend that further research focuses on early detection and understanding of the circumstances under which viral lineages with epidemiological impacts, such as VOCs, emerge.

Materials and Methods

Data Set Construction

We downloaded 100 randomly selected sequences in the global NextStrain SARS-CoV-2 build of August 2021 (Hadfield ), from the GISAID database (Elbe and Buckland-Merrett 2017; Shu and McCauley 2017). This set of sequences did not include any of those belonging to the four VOCs (Alpha, Beta, Gamma, or Delta) and we also excluded samples drawn from nonhuman hosts. We then downloaded 20 randomly selected sequences from the four VOCs to generate a data set of 180 genomes, which we aligned using MAFFT (Katoh and Standley 2013). We ensured that the sequences consisted of complete genomes, with no stretches of more than 10 Ns and excluding those with low coverage (see supplementary material, Supplementary Material online). To verify that samples classified as VOCs were correctly assigned as such, we estimated a phylogenetic tree using maximum likelihood as implemented in IQ-TREE2 (Minh ), using the GTR + Γ substitution model and with approximate Bayes branch support (Anisimova ). We ensured that all VOCs were monophyletic with an approximate Bayes support of at least <0.95.

Bayesian Phylogenetic Analyses

Our Bayesian analyses require specifying a substitution model, a tree prior and priors for all parameters in BEAST 1.10 (Suchard ). We chose the GTR+ substitution model and a coalescent exponential tree prior. Although the tree prior is not necessarily realistic here, it is expected to have little impact in molecular clock estimates (Ritchie ). Moreover, it can accommodate changes in population size via the exponential growth function and it is fully parametric, meaning that setting proper priors for all parameters is possible. To calibrate the molecular clock, we specified the sequence sampling times. The FLC models require constraining monophyly of VOCs, which we also did for other clock models to ensure that the prior on tree topology was the same. We used the default priors for the substitution model. The coalescent exponential tree prior has two parameters, the scaled population size, , and the growth rate r. The scaled population size is proportional to the number of infected individuals at present divided by twice the coalescent rate, λ (i.e., ), and the growth rate is inversely proportional to the doubling time by a factor of log(2) () (Volz ; Boskova ). We used priors with relatively low information content for these two parameters. For , we used an exponential distribution with mean 105, whereas for r, we used a Laplace distribution with location 0 and scale 100. For all molecular clock rates, we used a CTMC reference prior (Ferreira and Suchard 2008; Wang and Yang 2014). The UCLN and UCG models have an additional parameter; the standard deviation of the lognormal distribution, and the shape of the Γ distribution. For these parameters, we specified an exponential prior with mean 0.33. We ran our analyses for using a Markov chain Monte Carlo of length 5 × 107, sampling every 5 × 103 and discarding 10% of the chain as burn-in. We repeated the analyses once to verify convergence of independent chains and we ensured that the effective sample size of all parameters was at least 200.

Marginal Likelihood Estimation

We used two techniques to infer the log marginal likelihood: path sampling and stepping-stone (Gelman and Meng 1998; Lartillot and Philippe 2006; Xie ), which have been found to have high performance in differentiating models in phylogenetics (Baele , 2013; Fourment ), reviewed by Baele and Lemey (2014) and Oaks . We chose these estimators over the more recently developed and highly accurate generalized stepping-stone because the latter requires a working genealogical distribution (Baele ), which is not trivial here due to the monophyletic constraints in our models. Our estimation setup had 200 path steps distributed according to quantiles from a β distribution with parameter α = 0.3, with each of the resulting 201 power posterior inferences running for 106 iterations. We repeated these analyses ten times to assess variation in these calculations and used the log BFs of the mean values. Our model testing approach considered the UCLN, SC, and all FLC models in table 1 and supplementary material, Supplementary Material online. We did not calculate log marginal likelihoods for the RLC because this is a model averaging method, where the number of parameters is less tractable than in other models and thus it is difficult to conceive proper priors for all parameters, a fundamental aspect of Bayesian model selection.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  51 in total

1.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection.

Authors:  Wangang Xie; Paul O Lewis; Yu Fan; Lynn Kuo; Ming-Hui Chen
Journal:  Syst Biol       Date:  2010-12-27       Impact factor: 15.683

2.  Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.

Authors:  Guy Baele; Philippe Lemey; Marc A Suchard
Journal:  Syst Biol       Date:  2015-11-01       Impact factor: 15.683

3.  Computing Bayes factors using thermodynamic integration.

Authors:  Nicolas Lartillot; Hervé Philippe
Journal:  Syst Biol       Date:  2006-04       Impact factor: 15.683

4.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

5.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.

Authors:  Bui Quang Minh; Heiko A Schmidt; Olga Chernomor; Dominik Schrempf; Michael D Woodhams; Arndt von Haeseler; Robert Lanfear
Journal:  Mol Biol Evol       Date:  2020-05-01       Impact factor: 16.240

6.  SARS-CoV-2 evolution during treatment of chronic infection.

Authors:  Steven A Kemp; Dami A Collier; Rawlings P Datir; Isabella A T M Ferreira; Salma Gayed; Aminu Jahun; Myra Hosmillo; Chloe Rees-Spear; Petra Mlcochova; Ines Ushiro Lumb; David J Roberts; Anita Chandra; Nigel Temperton; Katherine Sharrocks; Elizabeth Blane; Yorgo Modis; Kendra E Leigh; John A G Briggs; Marit J van Gils; Kenneth G C Smith; John R Bradley; Chris Smith; Rainer Doffinger; Lourdes Ceron-Gutierrez; Gabriela Barcenas-Morales; David D Pollock; Richard A Goldstein; Anna Smielewska; Jordan P Skittrall; Theodore Gouliouris; Ian G Goodfellow; Effrossyni Gkrania-Klotsas; Christopher J R Illingworth; Laura E McCoy; Ravindra K Gupta
Journal:  Nature       Date:  2021-02-05       Impact factor: 69.504

7.  Updated rapid risk assessment from ECDC on the risk related to the spread of new SARS-CoV-2 variants of concern in the EU/EEA - first update.

Authors: 
Journal:  Euro Surveill       Date:  2021-01

8.  Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model.

Authors:  Jordan Douglas; Rong Zhang; Remco Bouckaert
Journal:  PLoS Comput Biol       Date:  2021-02-02       Impact factor: 4.475

9.  Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution.

Authors:  Guy Baele; Philippe Lemey; Stijn Vansteelandt
Journal:  BMC Bioinformatics       Date:  2013-03-06       Impact factor: 3.169

10.  A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants.

Authors:  Bethany Dearlove; Eric Lewitus; Hongjun Bai; Yifan Li; Daniel B Reeves; M Gordon Joyce; Paul T Scott; Mihret F Amare; Sandhya Vasan; Nelson L Michael; Kayvon Modjarrad; Morgane Rolland
Journal:  Proc Natl Acad Sci U S A       Date:  2020-08-31       Impact factor: 12.779

View more
  14 in total

Review 1.  Epidemiological inference from pathogen genomes: A review of phylodynamic models and applications.

Authors:  Leo A Featherstone; Joshua M Zhang; Timothy G Vaughan; Sebastian Duchene
Journal:  Virus Evol       Date:  2022-06-02

2.  Risk and severity of SARS-CoV-2 reinfections during 2020-2022 in Vojvodina, Serbia: A population-level observational study.

Authors:  Snežana Medić; Cleo Anastassopoulou; Zagorka Lozanov-Crvenković; Vladimir Vuković; Nataša Dragnić; Vladimir Petrović; Mioljub Ristić; Tatjana Pustahija; Zoran Gojković; Athanasios Tsakris; John P A Ioannidis
Journal:  Lancet Reg Health Eur       Date:  2022-07-01

3.  Accelerated SARS-CoV-2 intrahost evolution leading to distinct genotypes during chronic infection.

Authors:  Chrispin Chaguza; Anne M Hahn; Mary E Petrone; Shuntai Zhou; David Ferguson; Mallery I Breban; Kien Pham; Mario A Peña-Hernández; Christopher Castaldi; Verity Hill; Wade Schulz; Ronald I Swanstrom; Scott C Roberts; Nathan D Grubaugh
Journal:  medRxiv       Date:  2022-07-02

4.  Viral Evolution and Immunology of SARS-CoV-2 in a Persistent Infection after Treatment with Rituximab.

Authors:  Nathalie Van der Moeren; Philippe Selhorst; My Ha; Laura Heireman; Pieter-Jan Van Gaal; Dimitri Breems; Pieter Meysman; Kris Laukens; Walter Verstrepen; Natasja Van Gasse; Benson Ogunjimi; Kevin K Arien; Reinout Naesens
Journal:  Viruses       Date:  2022-04-03       Impact factor: 5.818

5.  Emergence and phenotypic characterization of the global SARS-CoV-2 C.1.2 lineage.

Authors:  Cathrine Scheepers; Josie Everatt; Daniel G Amoako; Houriiyah Tegally; Constantinos Kurt Wibmer; Anele Mnguni; Arshad Ismail; Boitshoko Mahlangu; Bronwen E Lambson; Darren P Martin; Eduan Wilkinson; James Emmanuel San; Jennifer Giandhari; Nelia Manamela; Noxolo Ntuli; Prudence Kgagudi; Sandile Cele; Simone I Richardson; Sureshnee Pillay; Thabo Mohale; Upasana Ramphal; Yeshnee Naidoo; Zamantungwa T Khumalo; Gaurav Kwatra; Glenda Gray; Linda-Gail Bekker; Shabir A Madhi; Vicky Baillie; Wesley C Van Voorhis; Florette K Treurnicht; Marietjie Venter; Koleka Mlisana; Nicole Wolter; Alex Sigal; Carolyn Williamson; Nei-Yuan Hsiao; Nokukhanya Msomi; Tongai Maponga; Wolfgang Preiser; Zinhle Makatini; Richard Lessells; Penny L Moore; Tulio de Oliveira; Anne von Gottberg; Jinal N Bhiman
Journal:  Nat Commun       Date:  2022-04-08       Impact factor: 14.919

6.  The Runaway Evolution of SARS-CoV-2 Leading to the Highly Evolved Delta Strain.

Authors:  Yongsen Ruan; Mei Hou; Xiaolu Tang; Xionglei He; Xuemei Lu; Jian Lu; Chung-I Wu; Haijun Wen
Journal:  Mol Biol Evol       Date:  2022-03-02       Impact factor: 16.240

7.  Comparative Analysis of SARS-CoV-2 Variants of Concern, Including Omicron, Highlights Their Common and Distinctive Amino Acid Substitution Patterns, Especially at the Spike ORF.

Authors:  Marios Nikolaidis; Athanasios Papakyriakou; Katerina Chlichlia; Panayotis Markoulatos; Stephen G Oliver; Grigorios D Amoutzias
Journal:  Viruses       Date:  2022-03-29       Impact factor: 5.818

8.  The Alpha Variant (B.1.1.7) of SARS-CoV-2 Failed to Become Dominant in Mexico.

Authors:  Selene Zárate; Blanca Taboada; Carlos F Arias; José Esteban Muñoz-Medina; Pavel Iša; Alejandro Sanchez-Flores; Celia Boukadida; Alfredo Herrera-Estrella; Nelly Selem Mojica; Mauricio Rosales-Rivera; Bruno Gómez-Gil; Angel Gustavo Salas-Lais; Clara Esperanza Santacruz-Tinoco; Héctor Montoya-Fuentes; Julio Elias Alvarado-Yaah; Gloria María Molina-Salinas; Gloria Elena Espinoza-Ayala; José Antonio Enciso-Moreno; Rosa María Gutiérrez-Ríos; Antonio Loza; Joaquín Moreno-Contreras; Rodrigo García-López; Xaira Rivera-Gutierrez; Andreu Comas-García; Rosa María Wong-Chew; Maria-Eugenia Jiménez-Corona; Rosa María Del Angel; Joel Armando Vazquez-Perez; Margarita Matías-Florentino; Marissa Pérez-García; Santiago Ávila-Ríos; Hugo G Castelán-Sánchez; Luis Delaye; León P Martínez-Castilla; Marina Escalera-Zamudio; Susana López
Journal:  Microbiol Spectr       Date:  2022-04-07

Review 9.  Pharmaceutical Prospects of Curcuminoids for the Remedy of COVID-19: Truth or Myth.

Authors:  Yaw-Syan Fu; Wan-Yi Ho; Ning Kang; May-Jywan Tsai; Jingyi Wu; Liyue Huang; Ching-Feng Weng
Journal:  Front Pharmacol       Date:  2022-04-14       Impact factor: 5.988

10.  Special Issue: "Viral Genetic Diversity".

Authors:  Jean-Michel Heraud; Anne Lavergne; Richard Njouom
Journal:  Viruses       Date:  2022-03-10       Impact factor: 5.048

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.