| Literature DB >> 32820179 |
Sandra Isabel1, Lucía Graña-Miraglia2,3, Jahir M Gutierrez4, Cedoljub Bundalovic-Torma2,3, Helen E Groves5, Marc R Isabel6, AliReza Eshaghi7, Samir N Patel7,8, Jonathan B Gubbay5,7,8, Tomi Poutanen4, David S Guttman2,3, Susan M Poutanen8,9,10.
Abstract
The COVID-19 pandemic, caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), was declared on March 11, 2020 by the World Health Organization. As of the 31st of May, 2020, there have been more than 6 million COVID-19 cases diagnosed worldwide and over 370,000 deaths, according to Johns Hopkins. Thousands of SARS-CoV-2 strains have been sequenced to date, providing a valuable opportunity to investigate the evolution of the virus on a global scale. We performed a phylogenetic analysis of over 1,225 SARS-CoV-2 genomes spanning from late December 2019 to mid-March 2020. We identified a missense mutation, D614G, in the spike protein of SARS-CoV-2, which has emerged as a predominant clade in Europe (954 of 1,449 (66%) sequences) and is spreading worldwide (1,237 of 2,795 (44%) sequences). Molecular dating analysis estimated the emergence of this clade around mid-to-late January (10-25 January) 2020. We also applied structural bioinformatics to assess the potential impact of D614G on the virulence and epidemiology of SARS-CoV-2. In silico analyses on the spike protein structure suggests that the mutation is most likely neutral to protein function as it relates to its interaction with the human ACE2 receptor. The lack of clinical metadata available prevented our investigation of association between viral clade and disease severity phenotype. Future work that can leverage clinical outcome data with both viral and human genomic diversity is needed to monitor the pandemic.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32820179 PMCID: PMC7441380 DOI: 10.1038/s41598-020-70827-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Global Distribution of SARS-CoV-2 Genome Sequences Possessing the Spike Protein D614G Mutation. G mutation as percentage of total sequences (% G) is represented with color shades as detailed in legend including data available as of (A) 17 March and (B) 30 March 2020. Hatched lines were added when less than 10 sequences were available for one country. The maps were built with the geographic information system QGIS (v2.18.21, https://qgis.org).
Figure 2Estimated Molecular Dating of Evolutionary History of 442 Representative Global SARS-CoV-2 Sequences (Late-December 2019–Mid-March 2020) and the Emergence of the D614G Clade. Maximum clade credibility (MCC) tree with dated branches estimated by Bayesian Evolutionary Analysis Sampling Trees (BEAST). Node colors indicate continents of isolation; x-axis indicating dates by year and days in decimal notation; D614G clade sequences are highlighted in a yellow box.
Figure 3Structural analysis of SARS-CoV-2 spike protein around position 614. (A) Location and distribution of SARS-CoV-2 viral proteins. The full trimeric form of the spike protein results from a complex of three identical spike monomers (right panel). (B) Three-dimensional depiction of a spike protein monomer. The receptor-binding domain is colored purple and the location of the aspartate residue in position 614 is highlighted in green. (C) Inter-atomic contacts between aspartate 614 (green) in a reference spike monomer (blue) and four residues (pink) in its adjacent spike protein monomer chain (white). These four contacts are destabilizing and create a hydrophilic-hydrophobic repelling effect that is lost upon replacement of aspartate by glycine in the D614G mutation (see Table 1). (D) Spatial distribution of aspartate 614 residue (green) and an adjacent glycosylated asparagine residue in position 616 (orange). The two residues point in opposite directions and thus it is unlikely they share a meaningful interaction. The image (A) was drawn using Affinity Designer (v1.8) (https://affinity.serif.com/en-gb/designer/). The trimeric and monomeric structures of the Spike protein were generated using Illustrate[19,41] (https://ccsb.scripps.edu/illustrate/) by rendering a protein structure from the Protein Data Bank with ID 6vsb[19] (https://www.rcsb.org/structure/6vsb). The image (B–D) was generated using UCSF Chimera (v1.14) (https://www.cgl.ucsf.edu/chimera/) with monomeric protein structure rendered in Chimera [19].
Inter-chain contacts lost upon D614G mutation between adjacent chains in the SARS-CoV-2 Spike protein.
| Residue in non-reference adjacent chain | Distance (Å) | Contact surface area (Å2) |
|---|---|---|
| Lys 854 | 5.2 | 10.0 |
| Thr 859 | 2.7 | 28.8 |
| Val 860 | 4.5 | 5.6 |
| Leu 861 | 5.6 | 1.0 |
Figure 4SARS-CoV-2 PCR Cycle threshold (Ct) values of different clinical samples plotted according to variant D (black dots) and G (white squares) at the position 614 in the spike protein. Dots represent individual Ct values; horizontal lines represent the mean and standard deviation.