| Literature DB >> 35889149 |
Niki Vassilaki1, Konstantinos Papadimitriou2, Anastasios Ioannidis3, Nikos C Papandreou4, Raphaela S Milona1, Vassiliki A Iconomidou4, Stylianos Chatzipanagiotou5.
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a novel virus that belongs to the Coronoviridae family, emerged in December 2019, causing the COVID-19 pandemic in March 2020. Unlike previous SARS and Middle East respiratory syndrome (MERS) outbreaks, this virus has a higher transmissibility rate, albeit a lower case fatality rate, which results in accumulation of a significant number of mutations and a faster evolution rate. Genomic studies on the mutation rate of the virus, as well as the identification of mutations that prevail and their impact on disease severity, are of great importance for pandemic surveillance and vaccine and drug development. Here, we aim to identify mutations on the SARS-CoV-2 viral genome and their effect on the proteins they are located in, in Greek patients infected in the first wave of the pandemic. To this end, we perform SARS-CoV-2 amplicon-based NGS sequencing on nasopharyngeal swab samples from Greek patients and bioinformatic analysis of the results. Although SARS-CoV-2 is considered genetically stable, we discover a variety of mutations on the viral genome. In detail, 18 mutations are detected in total on 10 SARS-CoV-2 isolates. The mutations are located on ORF1ab, S protein, M protein, ORF3a and ORF7a. Sixteen are also detected in patients from other regions around the world, and two are identified for the first time in the present study. Most of them result in amino acid substitutions. These substitutions are analyzed using computational tools, and the results indicate minor or major impact on the proteins' structural stability, which could probably affect viral transmissibility and pathogenesis. The correlation of these variations with the viral load levels is examined, and their implication for disease severity and the biology of the virus are discussed.Entities:
Keywords: M protein; ORF1ab; ORF3a; ORF7a; S protein; SARS-CoV-2; mutations; protein structure and stability
Year: 2022 PMID: 35889149 PMCID: PMC9322066 DOI: 10.3390/microorganisms10071430
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Mutations detected on SARS-CoV-2 genomes.
| Sample Name | Accession Number | Age | Gender # | Viral Load | Lineage | Amino Acid (Nucleotide) |
|---|---|---|---|---|---|---|
| 1851-S45 | SRR19213734 | 23 | M | 3.89 | A | S_D614G (A23403G) |
| 4405-S34 | SRR19215536 | 42 | M | 1.70 | A | S_D614G (A23403G), |
| 2384-S29 | SRR19215599 | 45 | F | 4.13 | A | S_D614G (A23403G) |
| 3125-S32 | SRR19215604 | 40 | M | 2.12 | B.39 | ORF1ab_H417R or Nsp2_H237R (A1515G), |
| 3396-S31 | SRR19215602 | 59 | M | 3.81 | B.40 | ORF1ab_I739V or Nsp2_I559V (A2480G), |
| 9096-S37 | SRR19215566 | 42 | M | −0.69 | A | ORF1ab_A1670V or Nsp3_A852V (C3064T), |
| 9097-S38 | SRR19215601 | 37 | F | −0.35 | B.40 | ORF1ab_I739V or Nsp2_I559V (A2480G), |
| 0524-S39 | SRR19215600 | 39 | F | −1.22 | A | ORF1ab_L3606F or Nsp6_L37F (G11083T), |
| 2098-S40 | SRR19215598 | 26 | M | −1.37 | B.39 | ORF1ab_H417R or Nsp2_H237R (A1515G), |
| 6642-S30 | SRR19215603 | 59 | M | 4.56 | B.40 | ORF1ab_I739V or Nsp2_I559V (A2480G), |
* The substitution of the termination (X) codon (TGA) of the ORF7a with a Leucine (L) codon (TTA) provokes an extension (ext) of the open reading frame by 5 codons (amino acids LLNFH). # F: Female and M: Male.
Figure 1Location of mutations detected on SARS-CoV-2 genomes.
Presence of specific SARS-CoV-2 mutations observed in GISAID sequences, as tracked by performing CoV-Glue web application.
| Mutations | ORF1ab | MutType | C1 | C2 | C3 | Ref-Codon | Mut-Codon | Count | Proportion |
|---|---|---|---|---|---|---|---|---|---|
| Nsp2_H237R | H417R | nonsyn | 1514 | 1515 | 1516 | cAt | cGt | 1219 | 0.000233 |
| Nsp2_I559V | I739V | nonsyn | 2480 | 2481 | 2482 | Att | Gtt | 3029 | 0.000579 |
| Nsp2_P585S | P765S | nonsyn | 2558 | 2559 | 2560 | Cca | Tca | 3215 | 0.000615 |
| Nsp3_A852V | A1670V | nonsyn | 5273 | 5274 | 5275 | gCa | gTa | 1428 | 0.000273 |
| Nsp4_H223H | H2986H | syn | 9221 | 9222 | 9223 | caC | caT | 7684 | 0.00147 |
| Nsp6_L37F | L3606F | nonsyn | 11,081 | 11,082 | 11,083 | ttG | ttT | 133,400 | 0.025514 |
| Nsp12_D269N | D4661N | nonsyn | 14,245 | 14,246 | 14,247 | Gat | Aat | 1456 | 0.000278 |
| Nsp12_Y455Y | Y4847Y | syn | 14,803 | 14,804 | 14,805 | taC | taT | 76,696 | 0.014669 |
| Nsp13_V356V | V5680V | syn | 17,302 | 17,303 | 17,304 | gtC | gtA | 136 | 0.000026 |
| Nsp13_V521V | V5845V | syn | 17,797 | 17,798 | 17,799 | gtA | gtG | 3079 | 0.000589 |
| Nsp15_S261L | S6713L | nonsyn | 20,401 | 20,402 | 20,403 | tCa | tTa | 5484 | 0.001049 |
| S_V341del | del | 22,583 | 22,584 | 22,585 | gTT | g-- | 13 | 0.000002 | |
| S_D614G | nonsyn | 23,402 | 23,403 | 23,404 | gAt | gGt | 5,182,511 | 0.991216 | |
| ORF3a_G251V | nonsyn | 26,143 | 26,144 | 26,145 | gGt | gTt | 8069 | 0.001543 | |
| M_F100F | syn | 26,820 | 26,821 | 26,822 | ttC | ttT | 8300 | 0.001587 | |
| ORF7a_X122L | nonsyn | 27,757 | 27,758 | 27,759 | tGa | tTa | 2104 | 0.000402 |
C1, C2 and C3 correspond to one of the three nucleotide positions in each codon. Counts are the number of different viral genomes in the database where each mutation was identified. Proportion refers to the total number of the online submitted sequences to GISAID.
Results of structure-based methods used for the analysis of non-synonymous mutations identified in the collected samples. Dynamut2: Values of ΔΔGStability (in kcal/mole) below 0.0 (<0.0) correspond to destabilizing mutations. SDM: Values of ΔΔGpred. (in kcal/mole) below 0.0 (<0.0) correspond to destabilizing mutations. MAESTROweb: Values of ΔΔGpred. below 0.0 indicate a stabilizing mutation. The values in parentheses correspond to cpred., confidence estimation, given as value between 0.0 (not reliable) and 1.0 (highly reliable).
| Protein | Mutation | Protein Structure | Dynamut2 | SDM (ΔΔGpred.) | MAESTROweb (ΔΔGpred.) |
|---|---|---|---|---|---|
| Nsp2 | H417R (H237R) # | 7MSW | −0.16 | +0.07 | −0.020 (0.902) |
| Nsp2 | I739V & P765S (I559V & P585S) | 7MSW | −0.28 | −2.13 & 0.46 | +0.035 (0.902) |
| Nsp3 | A1670V (A852V) | 7QCM | −0.58 | +1.16 | +0.010 (0.923) |
| Nsp12 | D4661N (D269N) | 7C2K | −0.21 | −0.11 | +0.095 (0.862) |
| Nsp15 | S6713L (S261L) | 7N06 | +0.28 | −0.3 | +0.006 (0.877) |
| ORF3a | G251V | D-I-Tasser model | −1.53 | +0.31 | +0.473 (0.845) |
| M protein | L54F | D-I-Tasser model | −0.78 | −1.31 | +2.230 (0.825) |
# The numbering of the mutated amino acids outside parenthesis corresponds to the position in the ORF1ab polyprotein before cleavage, while the respective numbering in parentheses corresponds to each Nsp individually.
Figure 2A view of the region where residues D614 and G614 are located in Spike glycoprotein after superposition of the trimeric wild-type colored blue (PDBid 6VSB, [99]) to the D614G trimeric structure colored green (PDBid 7KRQ, [98]). Residues D614, G614, K854 and T859 are presented as stick models. The image was prepared with the molecular graphics software PyMOL (www.pymol.org, accessed on 1 May 2022).