| Literature DB >> 34787497 |
Mohammad Alkhatib1, Valentina Svicher1, Romina Salpini1, Francesca Alessandra Ambrosio2, Maria Concetta Bellocchi1, Luca Carioti1, Lorenzo Piermatteo1, Rossana Scutari1, Giosuè Costa2,3, Anna Artese2,3, Stefano Alcaro2,3, Robert Shafer4, Francesca Ceccherini-Silberstein1.
Abstract
Since the beginning of the coronavirus disease 2019 (COVID-19) pandemic caused by it, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been undergoing a genetic diversification leading to the emergence of new variants. Nevertheless, a clear definition of the genetic signatures underlying the circulating variants is still missing. Here, we provide a comprehensive insight into mutational profiles characterizing each SARS-CoV-2 variant, focusing on spike mutations known to modulate viral infectivity and/or antigenicity. We focused on variants and on specific relevant mutations reported by GISAID, Nextstrain, Outbreak.info, Pango, and Stanford database websites that were associated with any clinical/diagnostic impact, according to published manuscripts. Furthermore, 1,223,338 full-length high-quality SARS-CoV-2 genome sequences were retrieved from GISAID and used to accurately define the specific mutational patterns in each variant. Finally, mutations were mapped on the three-dimensional structure of the SARS-CoV-2 spike protein to assess their localization in the different spike domains. Overall, this review sheds light and assists in defining the genetic signatures characterizing the currently circulating variants and their clinical relevance. IMPORTANCE Since the emergence of SARS-CoV-2, several recurrent mutations, particularly in the spike protein, arose during human-to-human transmission or spillover events between humans and animals, generating distinct worrisome variants of concern (VOCs) or of interest (VOIs), designated as such due to their clinical and diagnostic impacts. Characterizing these variants and their related mutations is important in tracking SAR-CoV-2 evolution and understanding the efficacy of vaccines and therapeutics based on monoclonal antibodies, convalescent-phase sera, and direct antivirals. Our study provides a comprehensive survey of the mutational profiles characterizing the important SARS-CoV-2 variants, focusing on spike mutations and highlighting other protein mutations.Entities:
Keywords: COVID-19; SARS-CoV-2; emerging variants; mutations; pandemic; variants
Mesh:
Substances:
Year: 2021 PMID: 34787497 PMCID: PMC8597642 DOI: 10.1128/Spectrum.01096-21
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1Mutations underlying the currently circulating variants in the spike glycoprotein. Only mutated positions are reported. The different domains of the spike glycoproteins are depicted. The consensus sequence for each variant was defined as nonsynonymous substitutions or deletions that occurred in >75% of sequences within that lineage. Each mutation (such as E484K) is indicated by a first letter that is the symbol for the reference amino acid of NC_045512.2 (e.g., E), a number for the amino acid position in the wild-type protein (e.g., 484), and a second letter representing the amino acid actually found in the sequence analyzed (e.g., K). The nomenclatures of the VOCs and some of the VOIs were those reported by WHO and Pango, while the rest of the VOIs and other variants were reported by Pango. Mutations in black refer to the mutations reported by Nextstrain, Outbreak.info, Pango lineages, and Stanford database websites, while mutations in gray are those that we identified by analyzing entire high-quality viral genome sequences from GISAID (n = 1,223,383). a The mutations L452R, E484K, and S494L are rarely present in this variant, with rates of 0.05%, 0.3%, and 0.3%, respectively. In addition to the deletion at position 144, a deletion at position 145 is also observed, with a low prevalence of about 0.02%. b This VOC was previously characterized additionally by the presence of L18F, which currently is only in about 38% of sequences, and 2 sublineages have evolved recently (B.1.351.2 and B.1.351.3) that have L18F at prevalences of about 94% and 93%, respectively. c The mutation P681H is rarely present in this variant, with a prevalence of 1.3%. d The mutations V70F, A222V, W258L, and K417N are detected in this variant with prevalences of about 0.3%, 12.1%, 0.2%, and 0.3%, respectively. Recently, this variant has evolved into 3 sublineages (AY.1, AY.2, and AY.3) that have acquired some additional mutations, as follows: AY.1 (also called Delta plus) presents W258L and K417N and AY.2 presents A222V and K417N, while AY.3 does not present specific Spike mutations. e The mutations S13I and W152C are only present in the B.1.429 variant. f The mutations T19I, G142D, and H1101D are detected in this variant with prevalences of 54.5%, 71.3%, and 30.8%, respectively. g The mutation Q52R is detected in this variant with a prevalence of 71.6%. h The mutations L452R, S477N, and E484K cooccur rarely in this variant, while they have sole prevalences of about 25.7%, 15.1%, and 54.0%, respectively. Recently, the B.1.526 variant has evolved into 2 sublineages (B.1.526.1 and B.1.526.2) that appear to have several more unique mutations. B.1.526.1 presents the mutations D80G, Y144Δ, F157S, L452R, D614G, T859N, and D950H, while B.1.526.2 presents L5F, T95I, D253G, S477N, D614G, and Q957R. i A large deletion of 7 amino acids between residues 247 and 253 is detected in 63.6% of sequences of this variant. j The mutation F565L is detected in this variant with a prevalence of about 6.9%. k An insertion is present at 145/146N in all sequences. l The mutation G142D is detected in this variant with a prevalence of 43.8%. m The mutations A262S and P272L can be detected with prevalences of 7.5% and 6.1%, respectively. n The deletion at positions 69 and 70 is detected in about 71% of sequences. o A 3-amino-acid insertion at 214TDR is present at a prevalence of 71.3% in this variant. p The mutations S98F, G769V, and K854N are detected with prevalences of 2.9%, 32.5%, and 8.9%, respectively. q The mutations R102I, E484K, and P812S are detected in this variant with prevalences of 50.5%, 6.0%, and 5.3%, respectively. r The mutation Q677H is present with a prevalence of about 28.1%. s A large deletion of 9 amino acids at residues 136 to 144 and an insertion of 4 amino acids at 679GIAL are present in all sequences.
FIG 2Three-dimensional representation of SARS-CoV-2 spike protein reporting residues characterizing the 4 variants of concern (VOCs). The protein is shown as a gray cartoon. The Alpha B.1.1.7, Beta B.1.351, Gamma P1, and Delta B.1.617.2 VOCs are represented as magenta, blue, cyan and forest-green spheres, respectively. The shared mutated residues present in all, 3, and 2 VOCs are reported as red, salmon, and chocolate spheres, respectively.
Mutations present in the currently circulating variants and their functional characterization
| Mutation(s) or deletion(s) characterizing SARS-CoV-2 variants | Variant(s) | Location | Potential impact | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Increase in: | Escape from: | ||||||||
| Infectivity | Transmissibility | Disease severity | Single or multiple antibodies | Convalescent sera | Vaccine | Diagnostic assay detection | |||
| Mutations | |||||||||
| S13I | Epsilon | NTD | NA | NA | NA | Yes | NA | NA | NA |
| L18F | Gamma, A.27 | NTD | NA | NA | NA | Yes | NA | NA | NA |
| T20N | Gamma | NTD | NA | NA | NA | Yes | NA | NA | NA |
| D80A, D80G | Beta, AV.1 | NTD | NA | NA | NA | Yes | NA | NA | NA |
| W152C, W152L, W152R | Epsilon, R.1, C.36.3 | NTD | NA | NA | NA | Yes | NA | NA | NA |
| D215G | Beta, B.1.616, AT.1 | NTD | NA | NA | NA | Yes | NA | NA | NA |
| A222V | B.1.177 | NTD | No | Yes | No | No | No | No | No |
| D253G | Iota | NTD | NA | NA | NA | Yes | NA | NA | NA |
| V367F | A.23.1 | RBD | Yes | NA | NA | Yes | NA | NA | NA |
| K417N, K417T | Beta, Gamma | RBD | No | No | No | Yes | Yes | Yes | No |
| N439K | B.1.258, B.1.466.2, AV.1 | RBD | Yes | No | No | Yes | Yes | No | No |
| L452R, L452Q | Delta, Epsilon, Kappa, B.1.617.3, C.16, A.27, C36.3, Lambda | RDB | Yes | No | No | Yes | Yes | No | No |
| Y453F | B.1.1.298 | RBD | Yes | No | No | Yes | No | No | No |
| S477N | B.1.160, B.1.620 | RBD | Yes | Yes | No | Yes | Yes | Yes | No |
| T478K | Delta, B.1.1.519 | RBD | NA | NA | NA | Yes | NA | NA | NA |
| V483A | B.1.616 | RDB | NA | NA | NA | Yes | NA | NA | NA |
| E484K, E484Q | Beta, Gamma, Eta, Zeta, Theta, B.1.621, B.1.620, B.1.1.318, AT.1, R.1, AV.1, Kappa, B.1.617.3 | RBD | No | No | No | Yes | Yes | Yes | No |
| N501Y | Alpha, Beta, Gamma, Theta, B.1.621, A.27, A.28 | RBD | Yes | Yes | No | Yes | Yes | Yes | No |
| D614G | All variants except A.23.1, A.27, and A.28 | S1/S2 | Yes | Yes | No | No | No | No | No |
| Q677H | Eta, C.36.3 | S1/S2 | NA | NA | No | No | No | No | No |
| P681H, P681R | Alpha, Theta, B.1.621, B.1.620, B.1.1.519, B.1.1.318, AV.1, Kappa, Delta, B.1.617.3, A.23.1 | S1/S2 | Yes | NA | No | No | No | No | No |
| Deletions | |||||||||
| Del H69-V70 | Alpha, Eta, B.1.620, B.1.1.298, A.28, C.36.3 | NTD | Yes | No | No | Yes | No | No | Yes |
| Del L141-G142-V143 | Theta | NTD | No | No | No | Yes | No | No | No |
| Del Y144 | Alpha, Eta, B.1.620, B.1.616, B.1.1.318, AV.1 | NTD | No | No | No | Yes | No | No | No |
| Del L242-A243-L244 | Beta, Theta, B.1.620 | NTD | No | No | No | Yes | No | No | No |
The nomenclatures of the variants are those reported by the Pango, Outbreak.info, and Stanford database websites.
NTD, N-terminal domain (amino acids [aa] 13 to 305); RDB, receptor binding domain (aa 319 to 541); S1/S2, the junction between subunits S1 and S2 (aa 542 to 690).
The table reports only the mutations that have been shown to have an impact on viral infectivity, transmissibility, or immunogenicity in published studies found on PubMed or preprints on bioRxiv or medRxiv. NA, data are not available.
Infectivity was evaluated in pseudotyped viruses and/or by structural analysis.
Transmissibility was evaluated by molecular epidemiology-based studies and/or in vivo studies.
Disease severity was evaluated by analyzing clinical outcomes in terms of long-lasting infections and/or hospitalization period.
FIG 3Mutations underlying the currently circulating variants in the SARS-CoV-2 proteins. Only mutated positions are reported. (A) The different structural and regulatory proteins are depicted. (B) The nonstructural proteins are depicted. The consensus sequence for each variant was defined as nonsynonymous substitutions or deletions that occurred in >75% of sequences within that lineage. Each mutation (such as P323L in the viral polymerase) is indicated by a first letter that is the symbol for the reference amino acid of NC_045512.2 (e.g., P), a number for the amino acid position in the wild-type protein (e.g., 323), and a second letter representing the amino acid actually found in the sequence analyzed (e.g., L). The nomenclatures of the VOCs and some of the VOIs were those reported by WHO and Pango, while the rest of the VOIs and other variants were reported by Pango. Mutations in black refer to the mutations reported by the Nextstrain, Outbreak.info, Pango lineages, and Stanford database websites, while mutations in gray are those that we identified by analyzing entire high-quality viral genome sequences from GISAID (n = 1,223,383). a A large deletion of 9 amino acids between residues 23 and 31 of ORF6 is detected in all sequences of this variant. b The mutations G238C in the nucleocapsid protein and G172C in the protein encoded by ORF3a are detected with prevalences of about 30.3% and 30.5%, respectively. c The mutations S2Y and R203K in the nucleocapsid protein are present with prevalences of about 12.1% and 13.4%, respectively. d The helicase mutation T588I is detected with a prevalence of about 21.9%. e The mutation E195D in NS6 is present with a prevalence of about 25.7%. f The mutation T51I in NS10 is present with a prevalence of about 56.1%. g The mutations L741F in PL-pro and T599I in Hel are present with prevalences of 6.2% and 2.1%, respectively.
Nomenclatures of variants
| SARS-CoV-2 variant | Origin of identification | No. of SARS-CoV-2 sequences ( |
|---|---|---|
| Variants of concern (VOCs) | ||
| 20I/501Y.V1 | English | 790,848 |
| 20H/501Y.V2 | South African | 16,530 |
| 20J/501Y.V3 | Brazilian | 34,722 |
| B.1.617.2 | Indian | 103,539 |
| Variants of interest (VOIs) | ||
| CAL.20C | USA | 32,715 |
| B.1.617.1 | Indian | 3,435 |
| B.1.525 | English/Nigerian | 2,312 |
| B.1.526 | USA | 28,057 |
| C.37 | Peru | 220 |
| P.2 | Brazilian | 3,733 |
| P3 | Philippines | 126 |
| B.1.621 | Colombian | 540 |
| B.1.620 | Cameroonian | 106 |
| B.1.616 | French | 38 |
| B.1.617.3 | Indian | 121 |
| Other variants | ||
| 20E. EU1 | Spanish | 132,246 |
| B.1.1.298 | Danish | 957 |
| 20A/S:439K | Scottish | 10,822 |
| 20A/S:98F | Belgian | 10,963 |
| 20A.EU2 | Portuguese | 20,727 |
| B.1.214.2 | Belgian | 631 |
| C.16 | Portuguese | 585 |
| B.1.1.519 | Mexicans | 16,475 |
| B.1.466.2 | Indonesian | 799 |
| A.23.1 | Ugandan | 936 |
| A.27 | German | 256 |
| A.28 | French | 296 |
| B.1.1.318 | English | 535 |
| C.36.3 | Egyptian | 943 |
| AT.1 | Russian | 113 |
| R.1 | Japanese | 8,918 |
| AV.1 | English | 139 |
Number of SARS-CoV-2 sequences retrieved from GISAID (until 7 July 2021) and analyzed in the study.
Considered a VOC or VOI by the WHO/CDC/ECDC.
The nomenclature was reported by the Nextstrain website.
The nomenclature was reported from the Stanford database, according to the Pango lineages.
The nomenclature was reported from the Pango website, according to the Pango lineages.
The nomenclature was reported from the Outbreak.info website, according to the Pango lineages.
The nomenclature was reported from the GISAID website.
The nomenclature reported from the World Health Organization (WHO).
Detected for the first time in the United Kingdom and currently named the Nigerian variant.