| Literature DB >> 33905892 |
Jose Arturo Molina-Mora1, Estela Cordero-Laurent2, Adriana Godínez3, Melany Calderón-Osorno4, Hebleen Brenes5, Claudio Soto-Garita6, Cristian Pérez-Corrales7, Jan Felix Drexler8, Andres Moreira-Soto9, Eugenia Corrales-Aguilar10, Francisco Duarte-Martínez11.
Abstract
Genome sequencing is a key strategy in the surveillance of SARS-CoV-2, the virus responsible for the COVID-19 pandemic. Latin America is the hardest-hit region of the world, accumulating almost 20% of COVID-19 cases worldwide. In Costa Rica, from the first detected case on March 6th to December 31st almost 170,000 cases have been reported. We analyzed the genomic variability during the SARS-CoV-2 pandemic in Costa Rica using 185 sequences, 52 from the first months of the pandemic, and 133 from the current wave. Three GISAID clades (G, GH, and GR) and three PANGOLIN lineages (B.1, B.1.1, and B.1.291) were predominant, suggesting multiple re-introductions from other regions. The whole-genome variant calling analysis identified a total of 283 distinct nucleotide variants, following a power-law distribution with 190 single nucleotide mutations in a single sequence, and only 16 mutations were found in >5% sequences. These mutations were distributed through the whole genome. The prevalence of worldwide-found variant D614G in the Spike (98.9% in Costa Rica), ORF8 L84S (1.1%) is similar to what is found elsewhere. Interestingly, the frequency of mutation T1117I in the Spike has increased during the current pandemic wave beginning in May 2020 in Costa Rica, reaching 29.2% detection in the full genome analyses in November 2020. This variant has been observed in less than 1% of the GISAID reported sequences worldwide in 2020. Structural modeling of the Spike protein with the T1117I mutation suggests a potential effect on the viral oligomerization needed for cell infection, but no differences with other genomes on transmissibility, severity nor vaccine effectiveness are predicted. In conclusion, genome analyses of the SARS-CoV-2 sequences over the course of the COVID-19 pandemic in Costa Rica suggest the introduction of lineages from other countries and the detection of mutations in line with other studies, but pointing out the local increase in the detection of Spike-T1117I variant. The genomic features of this virus need to be monitored and studied in further analyses as part of the surveillance program during the pandemic.Entities:
Keywords: COVID-19; Costa Rica; Genomic surveillance; Pandemic; SARS-CoV-2
Year: 2021 PMID: 33905892 PMCID: PMC8065237 DOI: 10.1016/j.meegid.2021.104872
Source DB: PubMed Journal: Infect Genet Evol ISSN: 1567-1348 Impact factor: 3.342
Fig. 1Dynamic, geographic and temporal distribution of SARS-CoV-2 genomes from Costa Rican cases. (A) An exponential increment of COVID-19 cases has been reported in Costa Rica since March 2020, with a similar profile for reported deaths (B). Samples for sequencing were obtained from the whole country, mainly from the Central Valley, which harbors the most populated region of the country (C). Image in (C) was obtained from the Microreact tool (https://microreact.org/project/r7tcnUYgWMRJ5Fdssvv7VZ).
Demographic information and basic results of the genome analysis of SARS-CoV-2 from Costa Rican cases.
| Parameter | Counts | |
|---|---|---|
| Demographic data | ||
| Mortality (known cases): 10 | ||
| Genome sequencing | ||
| Total variants for all genomes | 2120, including repeated mutations among genomes | |
| Total distinct variants | 283 variants, regardless frequency among genomes | |
| Variability | Reference genome size: 29903 bp | |
| Classification of variants | ||
| ORF1ab: 180 (63.6%) | Missense: 146 (51.6%) | |
| In only 1 genome: 190 (67.1%) | ||
| Genome sequence groups: clades and lineages | ||
| G: 46 (24.9%) | A.1: 2 (1.1%) | |
Fig. 2Variant calling analysis of SARS-CoV-2 genomes from Costa Rican cases of COVID-19. (A) Presence/absence of 283 different variants among 185 genomes. A few variants are widely distributed among genomes (*F = Frequency), and many variants are uniquely present in a single genome. (B) Distribution and accumulative percentage of variant frequency among genomes. Most variants are low-frequency mutations and only 16 variants are present in at least 5% (9) genomes. The most frequent variants (Spike D614G and ORF1a P4715L) are present in 183 (98.9%) genomes (arrow). The 16 variants are distributed along the SARS-CoV-2 genome, as shown in (C).
Variants of SARS-CoV-2 genomes observed in more than 5% (9) genomes from Costa Rican cases of COVID-19.
| Position genome | Ref | Alt | Gene | Position – cDNA | Position – protein | Type | Frequency (N) | Frequency (%) |
|---|---|---|---|---|---|---|---|---|
| 14408 | C | T | ORF1ab – RdRp | c.14144C>T | p.Pro4715Leu (P4715L) | missense | 183 | 98.9 |
| 23403 | A | G | Spike – Surface glycoprotein | c.1841A>G | p.Asp614Gly (D614G) | missense | 183 | 98.9 |
| 28881 | G | A | N – Nucleocapsid phosphoprotein | c.608G>A | p.Arg203Lys (R203K) | missense | 77 | 41.6 |
| 28882 | G | A | N – Nucleocapsid phosphoprotein | c.609G>A | p.Arg203Arg (R203R) | synonymous | 77 | 41.6 |
| 28883 | G | C | N – Nucleocapsid phosphoprotein | c.610G>C | p.Gly204Arg (G204R) | missense | 77 | 41.6 |
| 1059 | C | T | ORF1ab – NSP2 | c.794C>T | p.Thr265Ile (T265I) | missense | 70 | 37.8 |
| 10360 | G | A | ORF1ab – 3C-like proteinase | c.10095G>A | p.Lys3365Lys (K3365K) | synonymous | 66 | 35.7 |
| 28706 | C | T | N – Nucleocapsid phosphoprotein | c.433C>T | p.His145Tyr (H145Y) | missense | 54 | 29.2 |
| 29144 | C | T | N – Nucleocapsid phosphoprotein | c.871C>T | p.Leu291Leu (L291L) | synonymous | 54 | 29.2 |
| 8572 | G | T | ORF1ab – NSP4 | c.8307G>T | p.Trp2769Cys (W2769C) | missense | 52 | 28.1 |
| 10340 | C | T | ORF1ab – 3C-like proteinase | c.10075C>T | p.Pro3359Ser (P3359S) | missense | 52 | 28.1 |
| 2716 | C | T | ORF1ab – NSP2 | c.2451C>T | p.Gly817Gly (G817G) | synonymous | 44 | 23.8 |
| 17010 | C | T | ORF1ab – helicase | c.16746C>T | p.Ile5582Ile (I5582I) | synonymous | 44 | 23.8 |
| 7482 | C | T | ORF1ab – NSP3 | c.7217C>T | p.Ser2406Leu (S2406L) | missense | 43 | 23.2 |
| 26143 | G | A | ORF3a | c.751G>A | p.Gly251Ser (G251S) | missense | 12 | 6.5 |
The variant T1117I in the spike is marked using bold.
Fig. 3Frequency of T1117I in the spike along time in Costa Rica and the world. A notorious increment of the mutation has been reported in Costa Rica since May 2020, contrasting with the prevalence around the world which keeps relatively constant and low (A-B). Map in (B) was obtained from GISAID database.
Fig. 4Structural modeling of the spike protein of SARS-CoV-2. Variant D614G is present in 98.6% of the genomes in this study, which is also predominant worldwide (>90%, GISAID). D614G could affect the interaction with the host, as well as the immune response (vaccines), but real effects remain unclear. The variant T1117I is a variant very scarcely reported in the world (0.08%, GISAID), but the frequency in Costa Rica is 29.2%. The possible effect of this variant on the function of the spike is unknown.
Fig. 5Phylogenetic tree of SARS-CoV-2 genomes circulating in Costa Rica. Three GISAID clades (G, GH and GR) and three Pangolin lineages (B.1, B.1.1, and B.1.291) are dominant in Costa Rican cases. Deaths are similarly distributed along with clades. Variant T1117I in the spike is present in 54 genomes, which belong to a separated monophyletic cluster (dark-blue). Other variants are presented with different colors.