| Literature DB >> 34506640 |
Shihang Wang1,2, Xuanyu Xu1,2, Cai Wei1,2, Sicong Li1,2, Jingying Zhao2,3, Yin Zheng1,2, Xiaoyu Liu1,2, Xiaomin Zeng4, Wenliang Yuan5, Sihua Peng1,2.
Abstract
SARS-CoV-2 is a newly discovered beta coronavirus at the end of 2019, which is highly pathogenic and poses a serious threat to human health. In this paper, 1875 SARS-CoV-2 whole genome sequences and the sequence coding spike protein (S gene) sampled from the United States were used for bioinformatics analysis to study the molecular evolutionary characteristics of its genome and spike protein. The MCMC method was used to calculate the evolution rate of the whole genome sequence and the nucleotide mutation rate of the S gene. The results showed that the nucleotide mutation rate of the whole genome was 6.677 × 10-4 substitution per site per year, and the nucleotide mutation rate of the S gene was 8.066 × 10-4 substitution per site per year, which was at a medium level compared with other RNA viruses. Our findings confirmed the scientific hypothesis that the rate of evolution of the virus gradually decreases over time. We also found 13 statistically significant positive selection sites in the SARS-CoV-2 genome. In addition, the results showed that there were 101 nonsynonymous mutation sites in the amino acid sequence of S protein, including seven putative harmful mutation sites. This paper has preliminarily clarified the evolutionary characteristics of SARS-CoV-2 in the United States, providing a scientific basis for future surveillance and prevention of virus variants.Entities:
Keywords: SARS-CoV-2; bioinformatics; molecular evolution; spike protein
Mesh:
Substances:
Year: 2021 PMID: 34506640 PMCID: PMC8662038 DOI: 10.1002/jmv.27331
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Figure 1Distribution of mutations in the S protein. All mutations in the S protein are nonsynonymous, with position AA614 having the highest frequency
Figure 2Bayesian maximum clade credibility tree
Results of branch‐site model for SARS‐CoV‐2
| Model | Ln L | Parameter estimation | model comparison | LRT | Positive selection site | ||||
|---|---|---|---|---|---|---|---|---|---|
| Model A | −157,747.79203 | Site type | 0 | 1 | 2a | 2b | Model A vs. Model A null | 0.00000 |
498V* 1,039M* 1,125V* 1,183 E* 1,388N* 1,592R* 1,968P* 2,001S* 2,020S* 2,169S* 2,233S* 2,257S* 2,360S* |
| Site Ratio f | 0.41494 | 0.35320 | 0.12525 | 0.10661 | |||||
| Background branch ω0 | 0.31558 | 1.00000 | 0.31558 | 1.00000 | |||||
| Detection of branch ω1 | 0.31558 | 1.00000 | 65.38486 | 65.38486 | |||||
| Model A null | −157,831.26325 | 1 | |||||||
Seven harmful mutations detected in the S protein
| Mutation | Score | Prediction (critical = −2.5) |
|---|---|---|
| P589S | −3.966 | Deleterious |
| T716I | −3.293 | Deleterious |
| D936Y | −2.602 | Deleterious |
| S939F | −3.094 | Deleterious |
| P1162S | −2.722 | Deleterious |
| C1236F | −4.061 | Deleterious |
| C1250F | −5.057 | Deleterious |
The nucleotide mutation rate (substitutions per site per year) of different RNA virus
| Group | Family | Virus | Mutation rate | Reference |
|---|---|---|---|---|
| ss(+)RNA | Coronaviridae | SARS‐CoV‐2 | 8 × 10−4 |
|
| ss(+)RNA | Coronaviridae | SARS | 3.01 × 10−3 |
|
| ss(+)RNA | Coronaviridae | MERS‐CoV | 1.12 × 10−3 |
|
| ss(+)RNA | Coronaviridae | HCoV‐OC43 | 1.06 × 10−4 |
|
| ss(+)RNA | Coronaviridae | HCoV‐229E | 3.28 × 10−4 |
|
| ss(+)RNA | Coronaviridae | Avian coronavirus | 2.40 × 10−4 |
|
| ss(+)RNA | Coronaviridae | Bovine coronavirus | 5.37 × 10−4 |
|
| ss(+)RNA | Filoviridae | EBOV | 1.23 × 10−3 |
|
| ss(+)RNA | Picornaviridae | Hepatitis A virus | 9.76 × 10−4 |
|
| ss(+)RNA | Flaviviridae | Hepatitis C virus | 1.39 × 10−3 |
|
| ss(−)RNA | Orthomyxoviridae | Influenza A virus | 3.15 × 10−3 |
|
| ss(+)RNA | Flaviviridae | Dengue virus | 6.50 × 10−4 |
|
| ss(+)RNA | Picornaviridae | Human enterovirus A | 5.53 × 10−3 |
|
| ss(+)RNA | Picornaviridae | Human enterovirus B | 5.27 × 10−3 |
|
| ss(+)RNA | Picornaviridae | Poliovirus 1 | 1.17 × 10−2 |
|
| ss(−)RNA | Paramyxoviridae | Measles virus | 6.02 × 10−4 |
|
| ss(−)RNA | Rhabdoviridae | Rabies virus | 3.32 × 10−4 |
|
| dsRNA | Reoviridae | Human rotavirus A | 1.87 × 10−3 |
|