| Literature DB >> 35742840 |
Paloma Troyano-Hernáez1, Roberto Reinosa1, África Holguín1.
Abstract
Monitoring SARS-CoV-2's genetic diversity and emerging mutations in this ongoing pandemic is crucial to understanding its evolution and ensuring the performance of COVID-19 diagnostic tests, vaccines, and therapies. Spain has been one of the main epicenters of COVID-19, reaching the highest number of cases and deaths per 100,000 population in Europe at the beginning of the pandemic. This study aims to investigate the epidemiology of SARS-CoV-2 in Spain and its 18 Autonomous Communities across the six epidemic waves established from February 2020 to January 2022. We report on the circulating SARS-CoV-2 variants in each epidemic wave and Spanish region and analyze the mutation frequency, amino acid (aa) conservation, and most frequent aa changes across each structural/non-structural/accessory viral protein among the Spanish sequences deposited in the GISAID database during the study period. The overall SARS-CoV-2 mutation frequency was 1.24 × 10-5. The aa conservation was >99% in the three types of protein, being non-structural the most conserved. Accessory proteins had more variable positions, while structural proteins presented more aa changes per sequence. Six main lineages spread successfully in Spain from 2020 to 2022. The presented data provide an insight into the SARS-CoV-2 circulation and genetic variability in Spain during the first two years of the pandemic.Entities:
Keywords: SARS-CoV-2; Spain; accessory proteins; epidemic waves; lineages; mutation frequency; non-structural proteins; structural proteins; variability
Mesh:
Substances:
Year: 2022 PMID: 35742840 PMCID: PMC9223475 DOI: 10.3390/ijms23126394
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Proposed molecular functions of the twenty-six SARS-CoV-2 proteins.
| Protein | Proposed Molecular Function |
|---|---|
|
| |
| Spike (S) | Class I fusion protein that mediates attachment to the host cell’s receptor angiotensin-converting enzyme 2 (ACE2) through the receptor-binding domain (RBD), and fusion of viral and cellular membranes [ |
| Envelope (E) | Viral assembly and release through interaction with M protein [ |
| Membrane (M) | Virion shape, participates in E assembly and N attachment to the viral genome, interacts with S [ |
| Nucleocapsid (N) | Nucleocapsid protein, binding to RNA genome, participates in transcription and replication, interaction with M during viral assembly [ |
|
| |
| nsp1 | Leader protein, suppresses host gene expression by ribosome association, mediates RNA replication [ |
| nsp2 | Related to the disruption of intracellular host signaling in SARS-CoV infections [ |
| nsp3 | Papain-like protease [ |
| nsp4 | Implicated in membrane structure formation that is induced upon CoV infection and with which the RTC is thought to be associated [ |
| nsp5 | Chymotrypsin-like protease (3CLpro) (main protease), polyprotein processing [ |
| nsp6 | Induction of autophagosomes and limit of autophagosome expansion [ |
| nsp7 | Processivity cofactor for RdRp [ |
| nsp8 | Processivity cofactor for RdRp [ |
| nsp9 | Single-strand nucleic acid-binding protein [ |
| nsp10 | Increases nsp14 exoribonuclease and nsp16 2′-O-methyltransferase activities [ |
| nsp11 | Unknown |
| nsp12 | RNA-dependent RNA polymerase (RdRp), replication and transcription of the viral RNA genome [ |
| nsp13 | Superfamily 1 helicase with a zinc-binding domain involved in RTC: participates in capping [ |
| nsp14 | Proofreading exoribonuclease and N7 guanine-methyl transferase activity involved in the viral mRNA cap synthesis [ |
| nsp15 | Uridylate-specific endoribonuclease activity [ |
| nsp16 | 2′-O-Methyltransferase: mRNAs cap 2′-O-ribose methylation to the 5′-cap structure [ |
|
| |
| 3a | Type I INF inhibition [ |
| 6 | Type I INF inhibition [ |
| 7a | Type I INF inhibition [ |
| 7b | Unknown |
| 8 | Type I INF inhibition [ |
| 10 | There is controversy regarding its expression and whether it is a coding protein [ |
Polymorphisms, transitions and transversions ratio, and mutation frequency detected in Spanish SARS-CoV-2 sequences during the first two years of the pandemic among the 26 viral proteins.
| Locus | Number of Sequences | Location | Length (bp) | Number of | Ts:Tv | Mean Mutation Frequency |
|---|---|---|---|---|---|---|
| nsp1 | 86,080 | 266–805 | 540 | 621 | 1:0.64 | 1.34 × 10−5 |
| nsp2 | 85,659 | 806–2719 | 1914 | 2446 | 1:0.87 | 1.49 × 10−5 |
| nsp3 | 83,819 | 2720–8554 | 5835 | 7310 | 1:0.98 | 1.49 × 10−5 |
| nsp4 | 84,434 | 8555–10,054 | 1500 | 1130 | 1:0.49 | 8.92 × 10−6 |
| nsp5 | 85,208 | 10,055–10,972 | 918 | 605 | 1:0.44 | 7.73 × 10−6 |
| nsp6 | 85,511 | 10,973–11,842 | 870 | 777 | 1:0.73 | 1.04 × 10−5 |
| nsp7 | 86,668 | 11,843–12,091 | 249 | 257 | 1:0.78 | 1.19 × 10−5 |
| nsp8 | 86,849 | 12,092–12,685 | 594 | 405 | 1:0.43 | 7.85 × 10−6 |
| nsp9 | 86,713 | 12,686–13,024 | 339 | 262 | 1:0.45 | 8.91 × 10−6 |
| nsp10 | 84,592 | 13,025–13,441 | 417 | 290 | 1:0.51 | 8.22 × 10−6 |
| nsp11 | 84,593 | 13,442–13,480 | 39 | 39 | 1:1.29 | 1.18 × 10−5 |
| nsp12 | 84,069 | 13,442–16,236 | 2796 | 2934 | 1:1 | 1.25 × 10−5 |
| nsp13 | 85,477 | 16,237–18,039 | 1803 | 1212 | 1:0.49 | 7.86 × 10−6 |
| nsp14 | 84,666 | 18,040–19,620 | 1581 | 1210 | 1:0.50 | 9.04 × 10−6 |
| nsp15 | 85,788 | 19,621–20,658 | 1038 | 1021 | 1:0.78 | 1.15 × 10−5 |
| nsp16 | 85,050 | 20,659–21,552 | 894 | 651 | 1:0.65 | 8.56 × 10−6 |
| gene S | 83,928 | 21,563–25,384 | 3819 | 5486 | 1:1.28 | 1.71 × 10−5 |
| ORF3a | 86,034 | 25,393–26,220 | 825 | 1055 | 1:0.90 | 1.49 × 10−5 |
| gene E | 85,937 | 26,245–26,472 | 225 | 234 | 1:0.92 | 1.21 × 10−5 |
| gene M | 85,720 | 26,523–27,191 | 666 | 522 | 1:0.65 | 9.14 × 10−6 |
| ORF6 | 85,701 | 27,202–27,387 | 183 | 194 | 1:0.81 | 1.24 × 10−5 |
| ORF7a | 82,217 | 27,394–27,759 | 363 | 621 | 1:1.16 | 2.08 × 10−5 |
| ORF7b | 82,083 | 27,756–27,887 | 129 | 133 | 1:0.82 | 1.26 × 10−5 |
| ORF8 | 84,992 | 27,894–28,259 | 363 | 513 | 1:0.92 | 1.66 × 10−5 |
| gene N | 70,124 | 28,274–29,533 | 1257 | 2277 | 1:1.49 | 2.58 × 10−5 |
| ORF10 | 82,312 | 29,558–29,674 | 114 | 129 | 1:0.55 | 1.37 × 10−5 |
| Complete Genome | 32,334 | 1:0.90 | 1.24 × 10−5 | |||
| Non-structural proteins | 21,170 | 1:0.78 | 1.05 × 10−5 | |||
| Structural proteins | 8519 | 1:2.26 | 1.60 × 10−5 | |||
| Accessory proteins | 2645 | 1:0.93 | 1.52 × 10−5 | |||
Genes located according to reference SARS-CoV-2 sequence NCBI 045512.2. bp: base pair; Ts: transition; Tv: transversion. S: Spike; E: Envelope; M: Membrane; N: Nucleocapsid; nsp: non-structural protein.
Figure 1Spanish SARS-CoV-2 mutation frequency and rate of conserved aa positions per viral protein sorted from greatest to lowest. (a) Mutation frequency. X axis: mutation frequency [Mf = P i/(L n × N s)]; Y axis: SARS-CoV-2 loci. (b) Percentage of conserved amino acid positions. X axis: percentage of completely conserved aa sites; Y axis: SARS-CoV-2 proteins. (c) Number of total deletions in each SARS-CoV-2 protein. X axis: number of deletions detected; Y axis: SARS-CoV-2 proteins. Color code: in green: non-structural proteins, light green: ORF1ab (nsp1 to 11), dark green: ORF1b (nsp12 to 16); in blue: accessory proteins (3a to 10); in red: structural proteins. E: Envelope; M: Membrane; N: Nucleocapsid; S: Spike; nsp: non-structural protein.
Number of aa changes, deletions, stop codons, percentage of variable aa positions, and conservation across Spanish SARS-CoV-2 sequences in each of the 26 viral proteins.
| Protein | Number of | Length (aa) | Number of Changes | Mean Changes per Sequence * | Variable | aa Conservation (%) |
|---|---|---|---|---|---|---|
| nsp1 | 86,080 | 180 | 438 (404; 32; 2) | 0.10 | 92.78 | 99.95 |
| nsp2 | 85,659 | 638 | 1614 (1545; 48; 21) | 0.41 | 93.73 | 99.94 |
| nsp3 | 83,819 | 1945 | 4921 (4364; 423; 134) | 3.22 | 91.36 | 99.83 |
| nsp4 | 84,434 | 500 | 671 (661; 4; 6) | 1.15 | 72.80 | 99.77 |
| nsp5 | 85,208 | 306 | 334 (322; 9; 3) | 0.16 | 64.71 | 99.95 |
| nsp6 | 85,511 | 290 | 514 (471; 35; 8) | 1.79 | 81.03 | 99.38 |
| nsp7 | 86,668 | 83 | 144 (129; 7; 8) | 0.02 | 90.36 | 99.98 |
| nsp8 | 86,849 | 198 | 237 (236; 0; 1) | 0.03 | 74.75 | 99.98 |
| nsp9 | 86,713 | 113 | 146 (139; 3; 4) | 0.03 | 72.57 | 99.97 |
| nsp10 | 84,592 | 139 | 154 (152; 1; 1) | 0.02 | 64.03 | 99.98 |
| nsp11 | 84,593 | 13 | 20 (20; 0; 0) | 0.00 | 76.92 | 99.97 |
| nsp12 | 84,069 | 932 | 1832 (1526; 207; 99) | 1.88 | 87.34 | 99.80 |
| nsp13 | 85,477 | 601 | 648 (638; 1; 9) | 0.69 | 63.73 | 99.89 |
| nsp14 | 84,666 | 527 | 734 (704; 19; 11) | 0.72 | 71.16 | 99.86 |
| nsp15 | 85,788 | 346 | 659 (600; 39; 20) | 0.09 | 85.26 | 99.98 |
| nsp16 | 85,050 | 298 | 385 (371; 8; 6) | 0.07 | 72.15 | 99.98 |
| S | 83,928 | 1273 | 3838 (3318; 397; 123) | 10.80 | 91.52 | 99.13 |
| ORF3a | 86,034 | 275 | 811 (736; 59; 16) | 0.87 | 95.64 | 99.68 |
| E | 85,937 | 75 | 150 (133; 10; 7) | 0.12 | 85.33 | 99.84 |
| M | 85,720 | 222 | 288 (281; 2; 5) | 0.78 | 69.82 | 99.64 |
| ORF6 | 85,701 | 61 | 138 (123; 6; 9) | 0.02 | 95.08 | 99.97 |
| ORF7a | 82,217 | 121 | 499 (408; 62; 29) | 1.08 | 99.17 | 99.10 |
| ORF7b | 82,083 | 43 | 105 (92; 8; 5) | 0.45 | 97.67 | 98.96 |
| ORF8 | 84,992 | 121 | 396 (338; 27; 31) | 1.60 | 100.00 | 98.67 |
| N | 70,124 | 419 | 1661 (1459; 170; 32) | 3.79 | 99.28 | 99.09 |
| ORF10 | 82,312 | 38 | 96 (91; 2; 3) | 0.09 | 97.37 | 99.77 |
| Complete genome | 9757 | 21,433 (19,261; 1579; 593) | 1.15 | 84.06 | 99.69 | |
| Non-structural proteins | 7109 | 13,451 (12,282; 836; 333) | 1.25 | 79.19 | 99.84 | |
| Structural proteins | 1989 | 5937 (5191; 579; 167) | 3.87 | 86.49 | 99.42 | |
| Accessory proteins | 659 | 2045 (1788; 164; 93) | 0.68 | 97.49 | 99.36 | |
Conserved positions included all protein residues without any aa change, stop codon, or deletion; aa: amino acid; del: deletions; %: percentage; nsp: non-structural protein. * including aa changes and deletions.
Figure 2SARS-CoV-2 structural proteins’ Wu–Kabat variability coefficient plot and main protein regions. Y-axis: variability coefficient. X-axis: amino acid position and main protein domains. (a) Spike protein; RBD: receptor-binding domain; RBM: receptor-binding motif; red triangles: cleavage sites S1/S2 and S2′; purple boxes: fusion peptides 1 and 2. (b) Nucleocapsid protein; NTD: N-terminal domain; CTD: C-terminal domain; orange box: SR-rich linker. (c) Membrane protein; red boxes: transmembrane domains. (d) Envelope protein; red box: transmembrane domain; orange box: PDM (PDZ-binding motif).
Figure 3Amino acid changes and deletions present in ≥10% of the Spanish SARS-CoV-2 sequences. Color code: in green—non-structural proteins, light green: ORF1ab nsp1 to 11, dark green: ORF1b nsp12 to 16; in red—structural proteins; in blue—accessory proteins 3a to 10. E: Envelope, M: Membrane, N: Nucleocapsid; nsp: non-structural protein; del, deletion.
Figure 4Frequency difference (Δ) of the 57 amino acid changes and deletions present in ≥10% of the Spanish SARS-CoV-2 sequences over the six waves according to the Spanish epidemic curve. Under “Protein changes” heading: protein and aa change present in ≥10% of the Spanish sequences; E: Envelope, M: Membrane, N: Nucleocapsid, S: Spike, nsp: non-structural protein; colored bars: frequency of the aa change for each study period. In green: non-structural proteins, light green: ORF1ab nsp1 to 11, dark green: ORF1b nsp12 to 16; in red: structural proteins; in blue: accessory proteins 3a to 10. Period 1: epiweeks 2020.9 to 2020.25. Period 2: epiweeks 2020.26 to 2020.49. Period 3: epiweeks 2020.50 to 2021.10. Period 4: epiweeks 2021.11 to 2021.24. Period 5: epiweeks 24.2021 to 41.2021. Period 6: epiweeks 42.2021 to 4.2022. Δ: frequency difference between periods. Positive Δ values indicate an increase in the aa change frequency, negative Δ values indicate a decrease in the aa change frequency, and Δ values close to zero indicate no or minimal frequency change.
Figure 5Number of aa changes and deletions different from those reported in Figure 3 and present in ≥10% of SARS-CoV-2 sequences from the Spanish Autonomous Communities. Autonomous Communities: 1–17 in the map; Autonomous Cities (Ceuta and Melilla): number 18 in the map. Nsp: non-structural protein, S: Spike protein, E: Envelope protein, M: Membrane protein, N: Nucleocapsid protein.
Figure 6Epidemic curve and main SARS-CoV-2 lineages circulating in Spain per study period. The bold black line represents the epidemic curve with the number of SARS-CoV-2 cases per epidemiological week according to the official data available from the Spanish National Epidemiological Surveillance Network (RENAVE, https://cnecovid.isciii.es/covid19 (accessed on 7 April 2022)). Study period dates according to Table 2.
Figure 7Main SARS-CoV-2 lineages in the Spanish Autonomous Communities with more than 10 sequences available in GISAID for each study period. (a) Period 1. (b) Period 2. (c) Period 3. (d) Period 4. (e) Period 5. (f) Period 6. In color: AC with more than 10 sequences available in GISAID for each period. 1: Andalusia 2: Aragon, 3: Asturias, 4: Balearic Islands, 5: Basque Country, 6: Canary Islands, 7: Cantabria, 8: Castile La Mancha, 9: Castile and Leon, 10: Catalonia, 11: Extremadura, 12: Galicia, 13: La Rioja, 14: Madrid, 15: Murcia, 16: Navarre, 17: Valencian Community, 18: Ceuta and Melilla). Study period dates according to Table 2. B.1.1.7 (Alpha variant), B.1.351 (Beta variant), P.1 (Gamma variant), B.1.617.2/AY (Delta variant). *, the clusters of the Delta variant.
Study periods included in this study and relevant events.
| Periods | Epiweeks | Dates | Relevant Events |
|---|---|---|---|
| Period 1 | 09.2020 to 25.2020 | 24 February 2020 to 20 June 2020 | First Spanish COVID-19 wave. |
| 1.1 | 09.2020 to 11.2020 | 24 February 2020 to 14 March 2020 | From the beginning of the pandemic until the national lockdown (15 March 2020). |
| 1.2 | 12.2020 to 18.2020 | 15 March 2020 to 02 May 2020 | From the national lockdown until the beginning of the national deconfinement plan. |
| 1.3 | 19.2020 to 25.2020 | 03 May 2020 to 20 June 2020 | End of the first epidemic wave. |
| Period 2 | 26.2020 to 49.2020 | 21 June 2020 to 05 December 2020 | Second COVID-19 Spanish wave. |
| 2.1 | 26.2020 to 40.2020 | 21 June 2020 to 03 October 2020 | First peak of incidence after 2020 summer with a rise in the Rt* on early July. |
| 2.2 | 41.2020 to 49.2020 | 04 October 2020 to 05 December 2020 | Second peak of incidence before 2020 winter with another rise in the Rt in mid-October. |
| Period 3 | 50.2020 to 10.2021 | 06 December 2020 to 13 March 2021 | Third Spanish epidemic wave. Introduction of B.1.1.7 or Alpha variant. Start of the COVID-19 vaccination campaign. |
| Period 4 | 11.2021 to 24.2021 | 14 March 2021 to 19 June 2021 | Fourth Spanish epidemic wave. Alpha became the main circulating variant in Spain. Introduction of Delta variant during the last half of the period. End of the third state of emergency in May. |
| Period 5 | 25.2021 to 41.2021 | 20 June 2021 to 16 October 2021 | Fifth Spanish epidemic wave. Delta became the main circulating variant in Spain. |
| Period 6 | 42.2021 to 04.2022 | 17 October 2021 to 29 January 2022 | Sixth Spanish epidemic wave. Introduction of the Omicron variant, which quickly became the main circulating variant in Spain. |
* basic reproductive number.