| Literature DB >> 35234869 |
Yongsen Ruan1, Mei Hou1, Xiaolu Tang2, Xionglei He1, Xuemei Lu3, Jian Lu2, Chung-I Wu1, Haijun Wen1.
Abstract
In new epidemics after the host shift, the pathogens may experience accelerated evolution driven by novel selective pressures. When the accelerated evolution enters a positive feedback loop with the expanding epidemics, the pathogen's runaway evolution may be triggered. To test this possibility in coronavirus disease 2019 (COVID-19), we analyze the extensive databases and identify five major waves of strains, one replacing the previous one in 2020-2021. The mutations differ entirely between waves and the number of mutations continues to increase, from 3-4 to 21-31. The latest wave in the fall of 2021 is the Delta strain which accrues 31 new mutations to become highly prevalent. Interestingly, these new mutations in Delta strain emerge in multiple stages with each stage driven by 6-12 coding mutations that form a fitness group. In short, the evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from the oldest to the youngest wave, and from the earlier to the later stages of the Delta wave, is a process of acceleration with more and more mutations. The global increase in the viral population size (M(t), at time t) and the mutation accumulation (R(t)) may have indeed triggered the runaway evolution in late 2020, leading to the highly evolved Alpha and then Delta strain. To suppress the pandemic, it is crucial to break the positive feedback loop between M(t) and R(t), neither of which has yet to be effectively dampened by late 2021. New waves after Delta, hence, should not be surprising.Entities:
Keywords: Delta strain; SARS-CoV-2; positive feedback; runaway evolution
Mesh:
Year: 2022 PMID: 35234869 PMCID: PMC8903489 DOI: 10.1093/molbev/msac046
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Evolution of SARS-CoV-2 between February 1, 2020 and July 1, 2021 depicted by waves (i.e., successions of “mutation groups”) in UK (a), USA (b), India (c), and Global (d). Sequencing data were obtained from the GISAID database. The frequencies of nonsynonymous mutations (A) and synonymous mutations (S) reaching the frequency cutoff of 0.3 at their peaks are presented. The number of mutations is shown in the parentheses. Although a curve represents the rise and fall of a variant, each observed curve usually represents multiple curves that overlap completely. In COVID-19, there are five waves (W0 to W4). Note that the decline in each wave as the next one rises (between W1 and W2, W2 and W3, or W3 and W4) is true in both relative and absolute abundance.
Number of Variant Sites Associated with Each of the Five Waves in UK.
| Waves | Nonsynonymous (A) | Synonymous (S) | Noncoding (NC) | A:S:NC |
|---|---|---|---|---|
| W0 | 2 (C14408T, A23403G) | 1 (C3037T) | 1 (C241T) | 2:1:1 |
| W1 | 2 (G28881A, G28883C) | 1 (G28882A) | 0 | 2:1:0 |
| W2 | 4 (C21614T, C22227T, C28932T, G29645T) | 5 (T445C, C6286T, G21255C, C26801G, C27944T) | 1 (G204T) | 4:5:1 |
| W3 (Alpha) | 15 (see legends) | 5 (C913T, C5986T, C14676T, C15279T, T16176C) | 1 (C27972T) | 15:5:1 |
| W4 (Delta) | 26 (see legends) | 3 (C8986T, A11332G, T17040C) | 2 (G210T, G29742T) | 26:3:2 |
C3267T, C5388A, T6954C, A23063T, C23271A, C23604A, C23709T, T24506G, G24914C, G28048T, A28111G, G28280C, A28281T, T28282A, and C28977T.
G4181T, C6402T, C7124T, C7851T, G9053T, C10029T, A11201G, G15451A, C16466T, C19220T, C21618G, C21846T, G21987A, T22917G, C22995A, C23604G, G24410A, C25469T, T26767C, T27638C, C27752T, C27874T, A28461G, G28881T, G28916T, and G29402T.
Fig. 2.The process of mutation accumulation in clusters. (a) An illustration of the principle of mutation accumulation. Each of the four mutations, A–D, is acquired step-by-step but a large fitness gain is realized only when all of them are present. As the four mutations would become highly prevalent nearly concurrently, the trajectories of these mutations in figure 1 would appear to overlap strongly. (b) The actual process of mutation accumulation in the evolution of the Delta strain in India. Each row represents a particular nucleotide site and these mutations fall into three groups, labeled E, M, and L (early, middle, and late, respectively).
Fig. 3.The evolution of the Delta strain in India. The rises (and falls) of five distinct strains are shown in different colors during the evolution of Delta. The light color indicates the non-Delta strains that eventually disappear. The four colors represent the pre-Delta strains (bearing E, E + M, and E + M+L mutations) as well as the latest Delta strain bearing E + M+L + R mutations. Note that each of the pre-Delta or Delta strains must start with a single haplotype bearing all the characteristic mutations; hence, the increase in frequency in the beginning must be very substantial. At each time point indicated, the portrayed strains add up to 100%. The size of the entire viral population increases with time, but the depiction of the total number corresponds only roughly with the trend. The sets of E, M, L, and R mutations are depicted in figure 2 shown in the figure as 10A2S for ten nonsynonymous and two synonymous mutations in the group. NC is for noncoding mutations.
Fig. 4.The beginning of haplotype assembly and the distribution of mutations among individuals. The figure attempts to show how the Delta haplotype is first assembled in any individual. (a) From the data of September 2, 2020 to October 1, 2020, 28 variants are identified from 772 sequences in India. All haplotypes and their occurrences are given. Note that one single haplotype (#2461258) accumulates nearly all the mutants (including most of the singletons at that time) of the Delta strain. (b) A model on the emergence of a new haplotype (ABCD) from intrahost diversity to become interhost polymorphism. In this model, the gradual accumulation of mutants happens within hosts, thus creating the impression of sudden appearance of the haplotype (ABCD) between individuals.
Fig. 5.The number of Single-Nucleotide Variants (SNVs) accrued in the genomes of SARS-CoV-2 in the last 550 days. The number of SNVs relative to the reference genome of each strain (left y-axis) is plotted against collection date (x-axis). The SNVs are, respectively, from the coding regions (a), nonsynonymous sites (b), and synonymous sites (c). For each date, hundreds or thousands of strains were collected, and the left y-axis shows the average and 95% quantiles (shaded). The right y-axis shows the cumulative number of confirmed COVID-19 cases worldwide (downloaded from the WHO website).