| Literature DB >> 33580783 |
Liang Cheng1,2, Xudong Han2, Zijun Zhu2, Changlu Qi2, Ping Wang2, Xue Zhang1,3.
Abstract
Since the first report of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the COVID-19 pandemic has spread rapidly worldwide. Due to the limited virus strains, few key mutations that would be very important with the evolutionary trends of virus genome were observed in early studies. Here, we downloaded 1809 sequence data of SARS-CoV-2 strains from GISAID before April 2020 to identify mutations and functional alterations caused by these mutations. Totally, we identified 1017 nonsynonymous and 512 synonymous mutations with alignment to reference genome NC_045512, none of which were observed in the receptor-binding domain (RBD) of the spike protein. On average, each of the strains could have about 1.75 new mutations each month. The current mutations may have few impacts on antibodies. Although it shows the purifying selection in whole-genome, ORF3a, ORF8 and ORF10 were under positive selection. Only 36 mutations occurred in 1% and more virus strains were further analyzed to reveal linkage disequilibrium (LD) variants and dominant mutations. As a result, we observed five dominant mutations involving three nonsynonymous mutations C28144T, C14408T and A23403G and two synonymous mutations T8782C, and C3037T. These five mutations occurred in almost all strains in April 2020. Besides, we also observed two potential dominant nonsynonymous mutations C1059T and G25563T, which occurred in most of the strains in April 2020. Further functional analysis shows that these mutations decreased protein stability largely, which could lead to a significant reduction of virus virulence. In addition, the A23403G mutation increases the spike-ACE2 interaction and finally leads to the enhancement of its infectivity. All of these proved that the evolution of SARS-CoV-2 is toward the enhancement of infectivity and reduction of virulence.Entities:
Keywords: SARS-CoV-2; dominant mutation; evolutionary trend; interaction; virus virulence
Mesh:
Year: 2021 PMID: 33580783 PMCID: PMC7953981 DOI: 10.1093/bib/bbab042
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
The distribution of SARS-CoV-2 strains
| District | January | February | March | April |
|---|---|---|---|---|
| America | 10 | 73 | 290 | 328 |
| Europe | 12 | 47 | 433 | 158 |
| Asia | 144 | 149 | 153 | 12 |
Totally, 1809 SARS-CoV-2 strains were downloaded from GISAID.
Figure 1The workflow of our analysis on SARS-CoV-2 strains.
Figure 2The distribution of mutations in 1809 SARS-CoV-2 strains. (A) The number of derived strains with individual mutations in 1809 SARS-CoV-2 virus. (B) The average of accumulative mutations by month. (C) The average of accumulative mutations in America by month. (D) The average of accumulative mutations in Asia by month. (E) The average of accumulative mutations in Europe by month.
Figure 3The distribution of mutations in each of ORFs. (A) Significant score of the number of mutation locations in each of ORFs. (B) Significant score of the number of synonymous mutation locations in each of ORFs. (C) Significant score of the number of nonsynonymous mutation locations in each of ORFs. (D) Mutation rate in each of ORFs. (E) Synonymous substitution rate in each of ORFs. (F) Nonsynonymous substitution rate in each of ORFs.
Figure 4Linkage and tendency of 36 mutations occurred in 1% and more virus strains. (A) Scatter diagram of linkage disequilibrium between 36 SNPs. Horizontal axis and vertical axis represent r2 and LOD of pair-wise SNPs, respectively. (B) The ratio of 36 mutations occurs by month. (C) The ratio of 36 mutations in America occurs by month. (D) The ratio of 36 mutations in Asia occurs by month. (E) The ratio of 36 mutations in Europe occurs by month.
The significant LD variants
| Location 1 | Location 2 | LOD | |
|---|---|---|---|
| 379 | 2244 | 129.1 | 1 |
| 17 747 | 17 858 | 447.04 | 1 |
| 28 881 | 28 882 | 635.9 | 1 |
| 28 881 | 28 883 | 635.9 | 1 |
| 28 882 | 28 883 | 635.9 | 1 |
| 3037 | 23 403 | 1029.39 | 0.993 |
| 3037 | 14 408 | 1019.04 | 0.988 |
| 14 408 | 23 403 | 1005.18 | 0.981 |
| 8782 | 28 144 | 645.08 | 0.975 |
| 1397 | 28 688 | 128.82 | 0.967 |
| 17 747 | 18 060 | 427.89 | 0.965 |
| 17 858 | 18 060 | 427.89 | 0.965 |
Substitution rate of dominant mutations in each month
| Location | RaTG13 sequence | Reference sequence | Mutation sequence | ORF | Mutation type | Substitution rate in January | Substitution rate in February | Substitution rate in March | Substitution rate in April |
|---|---|---|---|---|---|---|---|---|---|
| 3037 | T | C | T | ORF1a | Synonymy | 0.006 | 0.16 | 0.69 | 0.93 |
| 8782 | T | C | T | ORF1a | Synonymy | 0.38 | 0.26 | 0.16 | 0.02 |
| 14 408 | C | C | T | ORF1b | Nonsynonymy | 0 | 0.16 | 0.69 | 0.93 |
| 23 403 | A | A | G | S | Nonsynonymy | 0.006 | 0.16 | 0.69 | 0.93 |
| 28 144 | C | T | C | ORF8 | Nonsynonymy | 0.38 | 0.26 | 0.15 | 0.02 |
| 1059 | C | C | T | ORF1a | Nonsynonymy | 0 | 0.022 | 0.33 | 0.46 |
| 25 563 | A | G | T | ORF3a | Nonsynonymy | 0 | 0.03 | 0.38 | 0.58 |
Prediction results of A23403G binding affinity using PPA-Pred
| Nucleotide | ΔG (kcal/mol) | Kd (M) |
|---|---|---|
| A | −14.36 | 2.96e–11 |
| G | −14.37 | 2.90e–11 |
ΔG is dissociation free energy and Kd is dissociation constant.
Figure 5Associations between SARS-CoV-2 strains among different continents. (A) Hierarchical clustering of virus genomes from different continents. (B) Hierarchical clustering of virus genomes based on nonsynonymous mutation. (C) Venn diagram of mutation sites of the five regional grouping SARS-CoV-2 strains. D. maximum likelihood phylogenetic tree of the five regional groups SARS-CoV-2 strains specific mutations.
The distribution of SARS-CoV-2 strains on different clusters
| District | January | February | March | April |
|---|---|---|---|---|
| America1 | 0 | 37 | 106 | 6 |
| America2 | 0 | 8 | 331 | 288 |
| Asia | 165 | 188 | 163 | 30 |
| Europe1 | 1 | 15 | 138 | 138 |
| Europe2 | 0 | 21 | 138 | 94 |