| Literature DB >> 32837981 |
Liang Huang1, Kui Liu2, Ke Ma3, Yuan Tian4,5, Yu Qin4,5, Haiyin Sun4,5, Wencheng Ding4,5, Lingli Gui6, Peng Wu4,5.
Abstract
As severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to disperse globally with worrisome speed, identifying amino acid variations in the virus could help to understand the characteristics of it. Here, we studied 489 SARS-CoV-2 genomes obtained from 32 countries from the Nextstrain database and performed phylogenetic tree analysis by clade, country, and genotype of the surface spike glycoprotein (S protein) at site 614. We found that virus strains from mainland China were mostly distributed in Clade B and Clade undefined in the phylogenetic tree, with very few found in Clade A. In contrast, Clades A2 (one case) and A2a (112 cases) predominantly contained strains from European regions. Moreover, Clades A2 and A2a differed significantly from those of mainland China in age of infected population (P = 0.0071, mean age 40.24 to 46.66), although such differences did not exist between the US and mainland China. Further analysis demonstrated that the variation of the S protein at site 614 (QHD43416.1: p.614D>G) was a characteristic of stains in Clades A2 and A2a. Importantly, this variation was predicted to have neutral or benign effects on the function of the S protein. In addition, global quality estimates and 3D protein structures tended to be different between the two S proteins. In summary, we identified different genomic epidemiology among SARS-CoV-2 strains in different clades, especially in an amino acid variation of the S protein at 614, revealing potential viral genome divergence in SARS-CoV-2 strains.Entities:
Keywords: ACE2; COVID-19; Phylogenetic tree; SARS-CoV-2; Surface spike glycoprotein
Year: 2020 PMID: 32837981 PMCID: PMC7264919 DOI: 10.1016/j.gendis.2020.05.006
Source DB: PubMed Journal: Genes Dis ISSN: 2352-3042
Figure 1Different SARS-CoV-2 clades among countries in phylogenetic tree. (A) Phylogenetic tree of 489 SARS-CoV-2 genomes from Nextstrain, the cases were colored by countries. Branch labels were clades. (B) Clade Distribution of 489 SARS-CoV-2 genomes in world map from Nextstrain. Color by clades.
Epidemiology characteristics of countries in Clades A2 and A2a.
| Countries | Cases in A2/A2a | Mean age | Female number | Male number | Total cases | Percent in A2/Aa2 |
|---|---|---|---|---|---|---|
| Netherlands | 65 | na | na | na | 107 | 60.74% |
| Switzerland | 13 | 33.31 | 2 | 11 | 14 | 92.86% |
| United Kingdom | 13 | 46.27 | 4 | 7 | 33 | 39.39% |
| Ireland | 4 | 28.5 | 2 | 2 | 5 | 80.00% |
| Finland | 3 | 41 | 1 | 2 | 6 | 50.00% |
| Germany | 3 | na | na | 1 | 15 | 20.00% |
| Italy | 2 | 38 | 1 | 1 | 4 | 50.00% |
| Portugal | 2 | 46.5 | 0 | 2 | 2 | 100.0% |
| Brazil | 1 | 61 | 0 | 1 | 2 | 50.00% |
| Spain | 1 | 56 | 0 | 1 | 2 | 50.00% |
| Luxembourg | 1 | na | na | na | 1 | 100.0% |
| Mexico | 1 | 35 | 0 | 1 | 1 | 100.0% |
| Nigeria | 1 | na | 0 | 1 | 1 | 100.0% |
| Czech Republic | 1 | 44 | 0 | 1 | 1 | 100.0% |
| Chile | 1 | 40 | 1 | na | 4 | 25.00% |
| Taiwan | 1 | 66 | 1 | 0 | 7 | 14.29% |
Epidemiology characteristics of China, Clade A2a, UK, Switzerland, USA.
| Countries/Clade | |||||
|---|---|---|---|---|---|
| Characteristics | China | A2a | UK | Switzerland | USA |
| Age | |||||
| 1-20 | 8 | 2 | 0 | 0 | 0 |
| 21-60 | 63 | 36 | 27 | 13 | 15 |
| >60 | 27 | 2 | 2 | 1 | 2 |
| Mean age | 46.66 | 40.24 | 46.17 | 33.29 | 49.06 |
| P value | 0.0974 | 0.1324 | |||
| Sex | |||||
| Female | 40 | 12 | 13 | 3 | 11 |
| Male | 69 | 30 | 16 | 11 | 8 |
| P value | 0.3463 | 0.4239 | 0.2594 | 0.0816 | |
The epidemiology characteristics of A2a clade, UK, Switzerland, and USA were compared with those of China. P-value was calculated by the Mann–Whitney U test or Kruskal–Wallis test. Bold P values indicate P < 0.05. na: not applicable, meant the missing data.
Figure 2Amino acid variation of S protein at site 614 in Clade A2 and A2a SARS-Cov-2 strains. (A) Radial phylogenetic tree of 489 SARS-CoV-2 from Nextstrain, the cases were colored by the amino acid of S protein at site 614. Green: glutamic acid (D), yellow: glycine (G). Branch labels were clades. (B) Rectangular phylogenetic tree of 489 SARS-CoV-2 from Nextstrain, the cases were colored by the amino acid of S protein at site 614. Green: glutamic acid (D), yellow: glycine (G). Branch labels were the amino acid variation of SARS-Cov-2 proteins. (C) Diversity of S protein of SARS-Cov-2.
Figure 3Variation in S protein at site 614 does not affect protein function. (A) Prediction results of the variation of S protein at 614. (B) Part multiple sequence alignment results, amino acids surrounding the variation position (614) were shown.
Figure 4Protein modeling estimate of QHD43416.1 and QHD43416.1: p.614D > G. (A) Amino acids of QHD43416.1 and QHD43416.1: p.614D > G surrounding the site 614. (B) Protein modeling estimate results of QHD43416.1 and QHD43416.1: p.614D > G from SWISS-MODEL Server.
Figure 5Three-dimensional (3D) protein structure models of QHD43416.1 and QHD43416.1: p.614D > G. (A) 3D protein structure of QHD43416.1 and QHD43416.1: p.614D > G performed by SWISS-MODEL Server. (B) The zooming-in region of site 614 of QHD43416.1 and QHD43416.1: p.614D > G performed by SWISS-MODEL Server.