| Literature DB >> 35336862 |
Binbin Xi1, Yuhuan Meng2, Dawei Jiang1, Yunmeng Bai1, Zixi Chen1, Yimo Qu1, Shuhua Li1, Jinfen Wei1, Lizhen Huang1, Hongli Du1.
Abstract
The scale of SARS-CoV-2 infection and death is so enormous that further study of the molecular and evolutionary characteristics of SARS-CoV-2 will help us better understand and respond to SARS-CoV-2 outbreaks. The present study analyzed the epidemic and evolutionary characteristics of haplotype subtypes or regions based on 1.8 million high-quality SARS-CoV-2 genomic data. The estimated ratio of the rates of non-synonymous to synonymous changes (Ka/Ks) in North America and the United States were always more than 1.0, while the Ka/Ks in other continents and countries showed a sharp decline, then a slow increase to 1.0, and a dramatic increase over time. H1 (B.1) with the highest substitution rate has become the most dominant haplotype subtype since March 2020 and has evolved into multiple haplotype subtypes with smaller substitution rates. Many evolutionary characteristics of early SARS-CoV-2, such as H3 being the only early haplotype subtype that existed for the shortest time, the global prevalence of H1 and H1-5 (B.1.1) within a month after being detected, and many high divergent genome sequences early in February 2020, indicate the missing of early SARS-CoV-2 genomic data. SARS-CoV-2 experienced dynamic selection from December 2019 to August 2021 and has been under strong positive selection since May 2021. Its transmissibility and the ability of immune escape may be greatly enhanced over time. This will bring greater challenges to the control of the pandemic.Entities:
Keywords: Ka/Ks; SARS-CoV-2; epidemic trends; evolution; haplotype subtypes; substitution rate
Mesh:
Year: 2022 PMID: 35336862 PMCID: PMC8954678 DOI: 10.3390/v14030454
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1The estimated substitution rate of SARS-CoV-2. The estimated substitution rate of different continents, countries with genome sequences >10,000, and China were plotted, while that of Turkey was not plotted because it had a small Pearson correlation coefficient (r = 0.20).
Figure 2The changes of the estimated ratio of the rates of non-synonymous to synonymous changes (Ka/Ks) of SARS-CoV-2. The Ka/Ks of SARS-CoV-2 between December 2019 and March 2020 was calculated as a whole as there are too few genome sequences in the early stage. (A) The changes of Ka/Ks in different continents. (B) The changes of Ka/Ks in different countries or regions with more than 10,000 genome sequences and China.
Figure 3The changes of the proportion of different haplotype subtypes over time. (A) Proportion changes of the four haplotype subtypes defined by the nine specific mutation sites. (B) Proportion changes of the eight haplotype subtypes defined by the 16 specific mutation sites. (C) Proportion changes of the seven haplotype subtypes defined by the 30 specific mutation sites. (D) Proportion changes of the eight haplotype subtypes defined by the 33 specific mutation sites.
Figure 4The Ka/Ks and substitution rate of H1 haplotype subtype. (A) The changes of the Ka/Ks of H1 haplotype subtype in different continents. (B) The changes of the Ka/Ks of H1 haplotype subtype in different countries or regions with more than 10,000 genome sequences and China. (C) The estimated substitution rate of the H1 haplotype subtype in different continents and countries or regions. The estimated substitution rate of the H1 haplotype subtype in Turkey was not plotted because it had a small Pearson correlation coefficient (r = 0.17).
Figure 5Early haplotype subtype distribution in the U.S. and the phylogenetic tree of early genome sequences in Washington State. (A) Genome sequences with the known location were plotted at the corresponding states of the U.S. on the map. (B) The reference sequence (NC_045512.2) and the genome sequences (MN985325.1) from the reported first case of the U.S. are marked in red.
Figure 6The evolutionary divergence among global early genome sequences. The box extends from the first quartile (Q1) to the third quartile (Q3) of the data, with a line at the median (salmon color). The whiskers extend from the box by 1.5× the interquartile range (IQR). Flier points (silver color) are those past the end of the whiskers. (A) The pairwise distance between early genome sequences. (B) The pairwise distance between early genome sequences and the reference sequence (NC_045512.2).
The Ka/Ks of some major haplotype subtypes.
| Haplotype Subtype | Pango Lineage | WHO | Ka/Ks * |
|---|---|---|---|
| H1 | B.1 | NA | 1.310 ± 0.757 |
| H1-5 | B.1.1 | NA | 0.931 ± 0.236 |
| H1-2 | B.1.177 | NA | 0.370 ± 0.097 |
| H1-4 | B.1.1.7 | Alpha | 0.947 ± 0.180 |
| H1-6-1 | B.1.617.2 | Delta | 2.055 ± 0.644 |
| Omicron | BA.1 | Omicron | 2.206 ± 0.436 |
* The estimated ratio of the rates of non-synonymous to synonymous changes.