| Literature DB >> 33512021 |
Matías J Pereson1,2, Diego M Flichman2,3, Alfredo P Martínez4, Patricia Baré2,5, Gabriel H Garcia1, Federico A Di Lello1,2.
Abstract
The spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has become the main target for antiviral and vaccine development. Despite its relevance, e information is scarse about its evolutionary traces. The aim of this study was to investigate the diversification patterns of the spike for each clade of SARS-CoV-2 through different approaches. Two thousand and one hundred sequences representing the seven clades of the SARS-CoV-2 were included. Patterns of genetic diversifications and nucleotide evolutionary rate were estimated for the spike genomic region. The haplotype networks showed a star shape, where multiple haplotypes with few nucleotide differences diverge from a common ancestor. Four hundred seventy-nine different haplotypes were defined in the seven analyzed clades. The main haplotype, named Hap-1, was the most frequent for clades G (54%), GH (54%), and GR (56%) and a different haplotype (named Hap-252) was the most important for clades L (63.3%), O (39.7%), S (51.7%), and V (70%). The evolutionary rate for the spike protein was estimated as 1.08 × 10-3 nucleotide substitutions/site/year. Moreover, the nucleotide evolutionary rate after nine months of the pandemic was similar for each clade. In conclusion, the present evolutionary analysis is relevant as the spike protein of SARS-CoV-2 is the target for most therapeutic candidates; besides, changes in this protein could have consequences on viral transmission, response to antivirals and efficacy of vaccines. Moreover, the evolutionary characterization of clades improves knowledge of SARS-CoV-2 and deserves to be assessed in more detail as re-infection by different phylogenetic clades has been reported.Entities:
Keywords: SARS-CoV-2; clades; evolution; spike protein
Mesh:
Substances:
Year: 2021 PMID: 33512021 PMCID: PMC8013443 DOI: 10.1002/jmv.26834
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Number of SARS‐CoV‐2 sequences from GISAID database on September 2020, by month and clade as per the selection criteria (temporal structure)
| Clade | Dec. | Jan. | Feb. | Mar. | Apr. | May | Jun. | Jul. | Aug. | Sep. | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 0 | 8 (2) | 20 (3) | 55 (7) | 52 (7) | 47 (7) | 39 (6) | 39 (6) | 20 (6) | 20 (6) | 300 |
|
| 0 | 0 | 18 (3) | 53 (7) | 50 (7) | 44 (7) | 40 (7) | 40 (7) | 35 (6) | 20 (6) | 300 |
|
| 0 | 0 | 35 (3) | 45 (7) | 50 (7) | 40 (7) | 43 (7) | 35 (7) | 32 (7) | 20 (6) | 300 |
|
| 17 (8) | 43 (5) | 53 (5) | 65 (6) | 55 (5) | 49 (4) | 14 (4) | 4 (2) | 0 | 0 | 300 |
|
| 0 | 35 (2) | 40 (4) | 55 (6) | 46 (6) | 42 (5) | 40 (5) | 24 (5) | 14 (5) | 4 (4) | 300 |
|
| 1 (1) | 50 (5) | 50 (5) | 70 (6) | 68 (6) | 31 (5) | 25 (5) | 4 (4) | 1 (1) | 0 | 300 |
|
| 0 | 4 (2) | 44 (4) | 101 (6) | 97 (6) | 33 (5) | 18 (4) | 2 (2) | 1 (1) | 0 | 300 |
|
| 18 (9) | 140 (16) | 260 (27) | 444 (45) | 418 (44) | 286 (40) | 219 (38) | 148 (33) | 103 (26) | 64 (22) | 2100 (300) |
The number of sequences selected for the general data set (N = 300), at each moment and clade, are shown in parentheses.
Figure 1Median‐joining haplotype networks. The seven clades of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) described to date are compared to both the entire Spike and the receptor binding protein (RBD) coding region. The diameters of the spheres are proportional to the frequency of haplotypes. The main haplogroups are indicated
Summary of the haplotype and nucleotide diversity indices for the entire spike and the receptor binding‐domain coding regions for each clade of SARS‐COV2
| Clade | S | H | Hd | Π |
|---|---|---|---|---|
| SPIKE | ||||
| G | 100 | 86 | 0.704 ± 0.030 | 0.00037 ± 0.00003 |
| GH | 102 | 89 | 0.704 ± 0.030 | 0.00038 ± 0.00003 |
| GR | 112 | 89 | 0.683 ± 0.031 | 0.00038 ± 0.00003 |
| L | 87 | 76 | 0.598 ± 0.035 | 0.00023 ± 0.00002 |
| O | 81 | 68 | 0.793 ± 0.019 | 0.00040 ± 0.00002 |
| S | 72 | 60 | 0.716 ± 0.027 | 0.00031 ± 0.00002 |
| V | 56 | 53 | 0.507 ± 0.036 | 0.00018 ± 0.00002 |
| General | 134 | 107 | 0.857 ± 0.015 | 0.00052 ± 0.00003 |
| RBD | ||||
| G | 15 | 15 | 0.183 ± 0.030 | 0.00028 ± 0.00005 |
| GH | 17 | 19 | 0.196 ± 0.031 | 0.00032 ± 0.00006 |
| GR | 23 | 23 | 0.281 ± 0.034 | 0.00041 ± 0.00006 |
| L | 15 | 14 | 0.104 ± 0.024 | 0.00016 ± 0.00004 |
| O | 9 | 9 | 0.193 ± 0.030 | 0.00027 ± 0.00004 |
| S | 3 | 4 | 0.027 ± 0.013 | 0.00003 ± 0.00002 |
| V | 3 | 4 | 0.020 ± 0.011 | 0.00003 ± 0.00001 |
| General | 17 | 17 | 0.166 ± 0.029 | 0.00026 ± 0.00005 |
Note: S, number of variable sites; H, number of haplotypes; Hd, haplotype diversity; π, nucleotide diversity (per site).
Frequency of haplotypes with amino acid changes in the spike for each clade of SARS‐COV2
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Clade/haplotype (aa change respect to Hap‐1) | G | GH | O | GR | Clade/haplotype (aa change respect to Hap‐252) | L | O | S | V |
| Hap‐1 | 162 (54) | 162 (54) | 25 (8.3) | 168 (56) | Hap‐252 | 190 (63.3) | 119 (39.7) | 154 (51.3) | 210 (70) |
| Hap‐7 (S477N) | 5 (1.7) | 10 (3.4) | Hap‐254 (H49Y) | 3 (1) | |||||
| Hap‐34 (N439K) | 4 (1.3) | Hap‐256 (D1084Y) | 3 (1) | ||||||
| Hap‐67 (P1263L) | 4 (1.3) | Hap‐282 (NC) | 6 (2) | 62 (20.7) | |||||
| Hap‐86 (L18F, A222V) | 20 (6.8) | Hap‐291 (L5F) | 10 (3.3) | ||||||
| Hap‐90 (A522S, E780C) | 5 (1.7) | Hap‐320 (A575S) | 8 (2.6) | ||||||
| Hap‐91 (E780C) | 6 (2) | Hap‐324 (A1087S) | 3 (1) | ||||||
| Hap‐105 (D936Y) | 10 (3.4) | Hap‐367 (L8V) | 17 (5.7) | ||||||
| Hap‐137 (E583D) | 3 (1) | Hap‐382 (V367F) | 5 (1.7) | ||||||
| Hap‐171 (Q675R) | 3 (1) | Hap‐384 (D614A) | 3 (1) | ||||||
| Hap‐187 (S12F) | 3 (1) | Hap‐415 (A829T) | 5 (1.7) | ||||||
| Hap‐226 (T478I) | 8 (2.6) | Hap‐437 (A846S) | |||||||
| Total | 300 (100) | 300 (100) | 300 (100) | Total | 300 (100) | 300 (100) | 300 (100) | 300 (100) | |
Note: aa, amino acid; N, number; Hap , haplotype; NC, no amino acid changes.
Hap‐1: S12, L18, R21, A222, N439, S477, T478, A522, E583, G614, Q675, E780, D936, V1068, and P1263.
Hap‐252: L5, L8, H49, V367, A575, D614, A829, A846, D1084, and A1087.
Mean rates of the Spike‐coding region (nt = 3822) for each clade of SARS‐COV2
| Clade |
| Model | Mean rate | HPD 95% inteval |
|---|---|---|---|---|
| G | 300 | TIM2+f | 1.47 × 10−3 | 1.05 × 10−3–1.95 × 10−3 |
| GH | 300 | TIM2+f + I | 1.42 × 10−3 | 9.67 × 10−4–1.94 × 10−3 |
| GR | 300 | TIM2+f + I | 1.69 × 10−3 | 1.11 × 10−3–2.30 × 10−3 |
| L | 300 | TIM2+f | 1.11 × 10−3 | 5.90 × 10−4–1.61 × 10−3 |
| O | 300 | TIM2u+f | 1.06 × 10−3 | 7.20 × 10−4–1.50 × 10−3 |
| S | 300 | TN + F | 1.33 × 10−3 | 8.41 × 10−4–1.83 × 10−3 |
| V | 300 | HKY + F | 1.15 × 10−3 | 6.51 × 10−4–1.64 × 10−3 |
| General | 300 | GTR + F + I | 1.08 × 10−3 | 7.94 × 10−4–1.41 × 10−3 |
Note: N, number of sequences.
Figure 2Test of temporal structure. Comparison of the evolutionary rates estimated for the original data set versus the date‐randomized ones. This analysis was performed for the Spike‐coding region (3822nt) of each clade. s.s.y, substitutions/site/year