Literature DB >> 34968862

Comparative mutational analysis of SARS-CoV-2 isolates from Pakistan and structural-functional implications using computational modelling and simulation approaches.

Abdullah Shah1, Saira Rehmat2, Iqra Aslam3, Muhmmad Suleman4, Farah Batool5, Abdul Aziz6, Farooq Rashid7, Muhmmad Asif Nawaz1, Syed Shujait Ali4, Muhammad Junaid8, Abbas Khan9, Dong-Qing Wei10.   

Abstract

SARS-CoV-2, an RNA virus, has been prone to high mutations since its first emergence in Wuhan, China, and throughout its spread. Its genome has been sequenced continuously by many countries, including Pakistan, but the results vary. Understanding its genomic patterns and connecting them with phenotypic features will help in devising therapeutic strategies. Thus, in this study, we explored the mutation landscape of 250 Pakistani isolates of SARS-CoV-2 genomes to check the genome diversity and examine the impact of these mutations on protein stability and viral pathogenesis in comparison with a reference sequence (Wuhan NC 045512.2). Our results revealed that structural proteins mainly exhibit more mutations than others in the Pakistani isolates; in particular, the nucleocapsid protein is highly mutated. In comparison, the spike protein is the most mutated protein globally. Furthermore, nsp12 was found to be the most mutated NSP in the Pakistani isolates and worldwide. Regarding accessory proteins, ORF3A is the most mutated in the Pakistani isolates, whereas ORF8 is highly mutated in world isolates. These mutations decrease the structural stability of their proteins and alter different biological pathways. Molecular docking, the dissociation constant (KD), and MM/GBSA analysis showed that mutations in the S protein alter its binding with ACE2. The spike protein mutations D614G-S943T-V622F (-75.17 kcal/mol), D614G-Q677H (-75.78 kcal/mol), and N74K-D614G (-73.84 kcal/mol) exhibit stronger binding energy than the wild type (-66.34 kcal/mol), thus increasing infectivity. Furthermore, the simulation results strongly corroborated the predicted protein servers. Our analysis findings also showed that E, M, ORF6, ORF7A, ORF7B, and ORF10 are the most stable coding genes; they may be suitable targets for vaccine and drug development.
Copyright © 2021. Published by Elsevier Ltd.

Entities:  

Keywords:  ACE2; COVID-19; Mutation; Phosphorylation serine/threonine kinases; SARS-CoV-2

Mesh:

Substances:

Year:  2021        PMID: 34968862      PMCID: PMC8709794          DOI: 10.1016/j.compbiomed.2021.105170

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   6.698


Introduction

In December 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus appeared in Wuhan, China, instigated a global coronavirus pandemic (COVID-19) [1,2]. Till October 2021, 241,199,505 cases have been reported with about 4,910,487 deaths recorded (WHO, 2020, (https://covid19.who.int). The transmission of SARS-CoV-2 is very high, with common symptoms attributed to this disease includes, high fever, breathing complexities, dry cough, and acute respiratory distress [[3], [4], [5], [6], [7], [8]]. SARS-CoV-2 is a single-stranded, positive-sense RNA virus with a genome size of ∼29.8 kb which encodes 16 non-structural proteins (nsp1–16) to form the replicase machinery of the virus [5,9]. It also codes for four structural proteins, namely Spike (S), envelope (E), membrane (M), and nucleocapsid (N). Besides structural and non-structural protein, SARS_CoV-2 codes for seven accessory proteins (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, and ORF10), which modulate host responses against viral infections to assist effective infection and pathogenesis [[10], [11], [12], [13], [14]]. Genome sequencing insights have shown nucleotide substitution rate as ∼1 × 10−3 per year for SARS-CoV-2 [15,16]. Variations in different proteins of SARS-CoV-2, particularly the spike glycoproteinlead to a drift in the antigenicity of vaccines and other therapeutics [17]. The mutation rate of RNA viruses, including, SARS-CoV-2 is exceptionally high, which in turn, increase the virulence and transmission of these viruses [16,[18], [19], [20]]. These mutations occur in almost all the proteins of SARS-CoV-2, thereby rendering this virus to escape the host immune system more efficiently and consequently increases the pathogenicity and transmissibility [[21], [22], [23], [24], [25], [26], [27], [28]]. These amino acid substitutions also alter the protein structures and consequently viral replication. The depletion of 9 amino acids in Orf6 caused the shifting of transmembrane localization that results in IFN resistance during antiviral therapy [29]. Mutations in the proteins affect the structure, function, and binding with the other proteins necessary for disease progression. Other mutations result in a new RNA polymerase variant that is reliant on RNA [30]. Some of the reported mutations also expedite the transmission of the virus i.e., D614G mutation in the spike protein has been reported to be associated with unusual increased in the transmission fitness of SARS-CoV-2 [23,[31], [32], [33], [34]]. The genome of the virus is unceasingly mutating since its outbreak in late 2019 and many variants are reported until now. The B.1.617 lineage was identified in India in October 2020 [24,35]. It then became a dominant strain in India and spread to other countries of the world as well. Furthermore, the B1.617.1, B.1.617.2, and B.1.617.3 subtypes of this lineage also harbours different mutations in the N-terminal domain (NTD) and the receptor-binding domain (RBD) of the spike protein [27,36]. Since the virus mutates often, it is hard to develop effective therapeutics which can hinder all the variants. To come up with a successful treatment approach or vaccination, it is necessary to have a greater understanding of both the virus and host genome [37]. An idea of the efficacy of the vaccines being researched may be extrapolated from sequencing the entire genome sequences of isolates collected from diverse locations across the world [38]. The race for a final magic drug still continues and according to the updated data provided by the COVID-19 vaccine and therapeutics tracker, 408 clinical trials of 138 vaccine candidates are under consideration and are in different clinical phases (https://covid19.trackvaccines.org/). Globally 6.65 billion doses of COVID-19 vaccines are administered among which 2.83 billion are fully vaccinated which counts for only 36% of the whole population. Understanding the mutational landscape of SARS-CoV-2 would provide a better view of the therapeutic interventions. For instance, Abbas et al., used computational approaches to study the impact of different mutations on the binding and infectivity [18,20,39,40]. Moreover, using these approaches many small molecules drugs are tested against different proteins of SARS-CoV-2 [[41], [42], [43], [44], [45], [46]]. Therefore, herein we used computational approaches to study the reported 250 genomes of SARS-CoV-2 isolated from Pakistan. In Pakistan, the first case of COVID-19 was reported on February 26, 2020. As of October 2021, about 1.27 million cases were reported, with total mortality of 28,280 were recorded in Pakistan. Hence, we used GSAID to extract the mutational spectrum in different proteins and their impact on the structure and function was estimated. Furthermore, protein-protein docking and molecular dynamics simulation approaches were used to grasp the impact of the most prevalent mutations on the binding and stability of the spike protein particulalry. Consequently, the current study provides distinguishable features for the discovery of novel therapeutics and Pakistan-specific anti-COVID-19 strategies.

Materials and methods

Collection of genome data for SARS-CoV-2

The genome sequence of 250 Pakistani isolates OF SARS-CoV-2 was downloaded from a global initiative on sharing all influenza data (GISAID) (https://www.gisaid.org/) [47] and NCBI [48]. Only coding sequences were analyzed for mutational analysis at the protein level in the present study.

Multiple sequence alignment

DNASTAR, Lasergene, and Clustal W tools align the 250 Pakistani SARS-CoV-2 isolates [49,50]. The previously described isolate of SARS-CoV-2 (Wuhan NC 045512.2) was utilized as a reference sequence for mutational analysis at the amino acid level.

Sequence-based mutational effects prediction

The I-mutant online tool was utilized to assess the structural stability change [51]. Mutpred2 was also utilized to examine the molecular consequences and functional impacts of mutations found [52].

Wild type structure retrieval and variants modelling

The wild-type structure of SARS-CoV-2 spike protein (6VSB) was obtained from the Protein Data Bank, and Chimera v 1.15 software was used for SARS-CoV-2 spike protein variant modeling [53,54]. The wild type structure 6VSB was used as a template that shares 97% identity with the sequence of the variant. The.FASTA format was used as input and BLAST was carried out to select the template based on sequence identity and coverage.

Protein-protein docking

HDOCK (http://hdock.phys.hust.edu.cn/) online server algorithm was used for protein-protein interaction of both wild type-ACE2 and variants-ACE2, to check the respective binding efficiencies [55]. HDOCK uses a hybrid algorithm that combines both template-based and free approaches to estimate the docking conformation and affinity.

Dissociation constant (KD) determination

Estimation of the strength of biomolecular association is an important parameter in biological processes. Herein, to estimate the strength of the wild type and mutants RBD with ACE2 PRODIGY (PROtein binDIng enerGY prediction)(https://wenmr.science.uu.nl/prodigy/) webserver was used [56].

Molecular dynamic simulation

MD simulation of the wild type and mutant complexes was completed by using the AMBER20 simulation package [57,58]. The default parameters used by Abbas et al., were employed to complete the 200ns simulation for each complex [59,60]. Post-simulation analyses such as RMSD and RMSF were calculated using CPPTRAJ and PTRAJ modules [61].

Binding free energy calculations

The binding energy between WT and mutant spike protein with the human receptor ACE2 was calculated using the MM/GBSA approach [62]. The spike mutants include N74K, D614G, V622F, Q677H, S943T, D115G with the human receptor ACE2 using the HAWKDOCK server [63]. The energy is calculated with the following equation: Each component of the total free energy was estimated using the following equation:

Results and discussion

Retrieved genome of SARS-CoV-2

A total of 250 SARS-CoV-2 isolates from Pakistan were obtained via the Global Initiative on GISAIDand the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/). Only coding sequences were examined for mutational analysis at the protein level (Table 1 ).
Table 1

Retrieved genomes of SARS-CoV-2 from Pakistan and their mutational landscape.

S. NoIsolatesORF1ABORF1ASpikeNMORF3AORF8ORF10
1MW447643.1Q63L, L3293F, P4075, SA4489V, P4716LQ63L P4075SD614GQ57H
2MW447622.1G82S, L2564F, P4716L, A4921VG82S L2564FD614G
3MW447639.1G82S, P4716LG82SD614GQ57H
4MT240479.1R207C, V378I, P2965L, L3606FR207C V378I P2965L L3606F
5MW447634.1K261 N, P4716L, T5451IK261 ND614GQ57H
6MW447614.1F275L, V2133A, P4716LD614GS194LQ57H
7MW242667.1K292E, V627F, T2018K, L3606FK292E V627F T2016K L3606FP13LV30L
8MT500122.1D448 N, N449, deletionD448 N N449delete
9MW447612.1G934C, P4716LD614GR209IQ57H
10MW031799.1S944L, T1246I, K1305 N, G3278S, P4716LS944LT1246I K1305 N G3278SD614GR203K
11MW031800.1S944L, T1246I, K1305 N, P4716LS944L T1246IK1305 N G3278SD614GR203K
12MW031801.1S944L, T1246I, K1305 N, P4716LS944L T1246I G3278SD614GR203K
13MW031802.1S944L, T1246I, K1305 N, P4716LS944L T1246IK1305 N G3278SN74K D614GR203K
14MW447609.1L1175I, P4716LD614GR203KQ57H
15MW447644.1T1754I, T2846I, P4716LT1754I T2846ID614GK21 N
16MW447645.1T2408I, P4716L, L6082FV2133A T2408ID614GS194LT379IQ57H
17MW447642.1A2618V, P4716LA2618VD614GR209IQ57H T223IW45L
18MW447611.1Q2702H, K3353R P4716L, V4979LQ2702H K3353RD614GS327LQ57H
19MT879619.1Q2702H, P4716L, A5561TQ2702H L3606FD614GS327LQ57HV30L
20MT262993.1(T3153Y, WFF3157, deleted) (L6418, YLDAY6424 N deleted)(T3153YWFF3157 deleted)
21MW447636.1P3447S, P4716LP3447SD614GR209IQ57H
22MW031803.1P4716LD614G
23MW400961.1P4716LD614GV622FS943TQ57H
24MW447638.1P4716L, V6600FD614G, Q677HS194LQ57H
25MW447640.1P4716L, V6600FA2956VD614GS194LH25Y
26MW447641.1P4716LD614GR209IQ57H
27MW447644.1P4716LR209I
28MW403500.1D614GS194LQ57H
29MW411960.1D614GR209I
30MW447610.1D614GQ57H
31MW447619.1D614GS194LQ57H
32MW447620.1D614GR209IQ57H
34MW447618.1S202 NL84S
35MW447621.1D614GD1153GR209IQ57H
Retrieved genomes of SARS-CoV-2 from Pakistan and their mutational landscape.

Worldwide mutational distribution of SARS-CoV-2 proteins

SARS-CoV-2 protein mutations were examined globally as part of a global initiative program for sharing all influenza data (GISAID). We observed mutations in most of the proteins of the SARS-CoV-2 virus. However, some proteins like envelop, ORF6, ORF7A, and ORF7B showed resistance to mutations. Additionally, specific proteins were more prone to mutations than others. Among the Structural proteins, spike protein was mutated the most, followed by nucleocapsid. Within spike, D614G was found to be mutated in 171 countries. Mutation R203K in nucleocapsid protein was observed in 151 countries (Fig. 1 A). Among the NSPs, the mutation P4715L in NSP12 was observed in 172 countries globally, followed by L3606F in NSP3, which occurred in 134 countries. The L1174I mutation in NSP3 was observed only in 2 countries (Fig. 1B). Among the accessory proteins, ORF8 was found to be the most mutated protein. Several mutations have been reported in the ORF8 protein, but L84S is a highly recurred mutation observed in 100 countries (Fig. 1A).
Fig. 1

Worldwide mutational distribution of SARS-CoV-2 proteins. (A) Among the Structural proteins, D614G mutation in spike protein was found the most reoccurring mutation in 171Countries while L84S in ORF8 was observed in 100 countries. (B) Among the Non-Structural proteins, the mutation P4715L in NSP12 was observed in 172 countries of the world, followed by L3606F in NSP3 which occurred in 134 countries while the L1174I mutation in NSP3 was observed only in 2 countries.

Worldwide mutational distribution of SARS-CoV-2 proteins. (A) Among the Structural proteins, D614G mutation in spike protein was found the most reoccurring mutation in 171Countries while L84S in ORF8 was observed in 100 countries. (B) Among the Non-Structural proteins, the mutation P4715L in NSP12 was observed in 172 countries of the world, followed by L3606F in NSP3 which occurred in 134 countries while the L1174I mutation in NSP3 was observed only in 2 countries.

Mutational spectrum of the structural proteins of Pakistani SARS-CoV-2 isolates

Mutational analysis of the structural proteins of 250 Pakistani isolates of SARS-COV2 harbors six different amino acid substitutions in spike protein, one in membrane protein, nine amino acids substitutions, and a large deletion in nucleocapsid phosphoprotein. At the same time, no mutation was observed in envelope protein. The spike protein of 29 out 50 samples of SARS-COV2 Pakistani isolates harbors a signature D614G mutation indicative of the widespread mutation in Pakistani isolates. The D614G mutation in spike protein has been reported to strengthen the folding stability of the spike protein. This mutation was first reported in China and is the second-largest mutation in the United States, and its ratio is increasing as time goes on. The D614G mutation of the spike protein where Aspartic acid (D) with a polar negative side charged amino acid is substituted with Glycine (G) with a nonpolar side chain amino acid. This D614G mutation of spike protein has reported a critical mutation that makes the SARS-CoV-2 more contagious and enhances its infectivity [64]. The other less common mutation that co-occurred with D614G was N74K, V622F, Q677H, S943T, and D1153G (Table 1). Surprisingly, in the spike protein, the D614G substitution was associated with two consecutive series of two substitutions R203K, and G204R of the nucleocapsid phosphoprotein in five isolates (MW031799.1, MW031800.1, MW031801.1, MW0831802.1, and MW031803.1). These two mutations in nucleocapsid phosphoprotein were previously reported in the United States [64]. The amino acid arginine (R) and Lysine (L) are both positively charged, and its substitution may not affect the function of the nucleocapsid phosphoprotein. In contrast, the Glycine (a nonpolar) substitution by arginine (R) may affect the hydrophilicity of the nucleocapsid phosphoprotein of SARS-CoV2. Furthermore, a more significant deletion was observed in the nucleocapsid phosphoprotein of isolate MT240479.1 (Table 1). While the envelop protein was 100% identical in all isolates compared with the reference sequence (Wuhan NC_045512.2).

Mutational spectrum of the accessories proteins of Pakistani SARS-CoV-2 isolates

Among accessories protein of SARS-CoV-2 of Pakistani isolates, three proteins (ORF3A, ORF8, and ORF10) showed different amino acid substitution while no deletion was observed in these proteins of the selected isolates (Table 1). ORF3a protein (274 amino acids) of SARS-CoV-2 is expressed abundantly in infected cells and plays an essential role in pathogenesis. In these SARS-COV2 isolates, a signature Q57H substitution in ORF3 protein was observed in 23 out of 50 samples which is the second frequent mutation in these Pakistani isolates. This substitution of Q57H was first reported in Singapore and occurred in 70% of US Covid-19 cases [64]. The prevalence of Q57H substitution in ORF3 protein was reported relatively high (62.7%) in South Africa [65]. The Q57H substitution in ORF3 co-occurs with two other amino acids mutations K21 N and T223I, in isolates MW447644.1, MW447642.1 respectively. The SARS-COV-2, ORF8 has been reported to reduce the translocation of IRF3 to the nucleus and consequently reduce expression of the interferon production and to down-regulate MHC-1 in cells [66,67]. Interestingly, in Pakistani SARS-COV-2 isolates, a new mutation in ORF8, W45L was observed in the present study, while 35 out of 50 isolates have the ORF8-L genotype. The substitution of Tryptophan (W) by lysine (L) may affect the hydrophilicity of the ORF8 of SARS-CoV-2 protein. In ORF10, single substitution V30L has been observed in two isolates MT 879619.1 and MW242667.1. While the other accessories protein ORF6, ORF7A, and ORF7B were found 100% identical to the reference sequence Wuhan NC_045512.2).

Mutational spectrum of non-structural proteins of Pakistani SARS-CoV-2 isolates

Mutational analysis of non-structural proteins of 50 Pakistani SARS-CoV-2 isolates harbored 75 substitution mutations and deletions compared to the reference sequence (NC_045512.2). Among substitution, two in nsp1 (Q63L, G82S), seven in nsp2 (R197C, K261 N, F275L, K292E, V378I, D448 N, V627F) and one deletion at N449, Nine amino acids substitutions in nsp3 (L1174I, T1246I, K1308 N, T1754I, T2016K, T2408F, L2564F, A2619V, Q2702H), two substitutions (T2848I, P2926L and five amino acid deletion 3153–3157) in nsp4, four in nsp5 (G3278S, L3393F, K3353R, P3447S), Two in nsp6 (L3606F, A3656V), one in nsp8 (P4075S), four in nsp12 (A4489V, P4715L, A4921V, V4979L), two in nsp13 (T5451I, A5561T), one substitution (L6082F), seven amino acid deletion (6418–6424) in nsp14, one in nsp15 (V6600F) while no mutation was observed in nsp7, nsp9, nsp10, and nsp16. Among the substation, two substations of nsp3 T1246I and K1308 N co-occurred with substitution G3278S and P4715L of nsp5 and nsp12, respectively in isolates MW031799.1, MW031800.1, MW031801.1, MW031802.1. In isolate MT500122 the substation D448 N has preceded a deletion of N449 amino acid in nsp2. Furthermore, two consecutive deletions of five amino acids FYWFF (3153–3157) and seven amino acids LYLDAYN (6418–6424) in nsp4 have been observed in the isolate MT262993 as compared to the reference sequence (Table 1).

Identification of hyper-variable genomic hotspot in SARS-CoV-2 genomes of Pakistani isolates

Interestingly, among all recurrent mutations in structural protein, the spike protein has one D614G (82%), nucleocapsid had three S194L (20%), R203K (14%), G204R (14%) R209I (25%) mutation hotspot (Fig. 2 ). The accessory protein the ORF3A has one mutation hotspot Q57H (63%). Among the non-structural proteins, nsp3 have four mutation hotspots S944L (11%), T1246I (11%), K1305 N (8%), NSP12 have one P4715L (57%), NSP15 have one V6600F (5%) mutation hotspots in Pakistani isolates as compared with the reference (Wuhan NC_045512.2) (Fig. 2).
Fig. 2

Schematic representation illustrating the distribution of recurrent mutations in structural (S, E, M, N), Accessories (ORF3A, ORF6, ORF7A, ORF7B, ORF8, ORF10), and nonstructural (nsp1 to nsp 16) proteins along with the SARS-CoV-2 genome. Recurrent mutations are represented by vertical lines.

Schematic representation illustrating the distribution of recurrent mutations in structural (S, E, M, N), Accessories (ORF3A, ORF6, ORF7A, ORF7B, ORF8, ORF10), and nonstructural (nsp1 to nsp 16) proteins along with the SARS-CoV-2 genome. Recurrent mutations are represented by vertical lines.

Mutational effects on structure stability

It has been determined that 27 mutations decrease structural stability, whereas six mutations enhance structural stability due to I-mutant server analysis of 96 missense mutations. These proteins' structural stability was expected to be lowered by most mutations in surface glycoprotein and nucleocapsid phosphoprotein, whereas specific changes in ORF1AB and ORF1A were projected to improve it (Table 2 ).
Table 2

Prediction of the mutational effects on the structural stability. A negative value indicates destabilizing effect.

IndexProteinAmino Acid ChangesSVM2 Prediction EffectDDG Value (kcal/mol)
1Surface GlycoproteinN74KDecrease−0.49
2Surface GlycoproteinD614GDecrease−0.93
3Surface GlycoproteinV622FDecrease−1.01
4Surface GlycoproteinQ677HDecrease−0.67
5Surface GlycoproteinS943TDecrease−0.24
6Surface GlycoproteinD1153Decrease−1.08
7Nucleocapsid phosphoproteinP13LDecrease−0.48
8Nucleocapsid phosphoproteinR203KDecrease−0.93
9Nucleocapsid phosphoproteinG204RDecrease−0.52
10Nucleocapsid phosphoproteinT205IDecrease−0.53
11ORF1AB PolyproteinR207CDecrease−0.60
12ORF1AB PolyproteinK292EDecrease−0.42
13ORF1AB PolyproteinV378IDecrease−0.23
14ORF1AB PolyproteinS944LDecrease0.63
15ORF1AB PolyproteinT1246IIncrease0.14
16ORF1AB PolyproteinK1305 NIncrease−0.10
17ORF1AB PolyproteinQ2702HDecrease−0.68
18ORF1AB PolyproteinG3278SDecrease−0.89
19ORF1AB PolyproteinL3606FDecrease−1.00
20ORF1AB PolyproteinA4489LDecrease−0.67
21ORF1AB PolyproteinP4715LDecrease−0.83
22ORF1A PolyproteinR207CDecrease−0.60
23ORF1A PolyproteinK292EDecrease−0.42
24ORF1A PolyproteinS318IIncrease0.36
25ORF1A PolyproteinD448 NDecrease−0.98
26ORF1A PolyproteinV627FDecrease−1.19
27ORF1A PolyproteinS944LIncrease0.63
28ORF1A PolyproteinT1246IIncrease0.14
29ORF1A PolyproteinK1305 NIncrease−0.10
30ORF1A PolyproteinP2965LDecrease−0.67
31ORF1A PolyproteinG3278SDecrease−0.89
32ORF1A PolyproteinL3606FDecrease−1.00
330RF3A PolyproteinQ57HDecrease−0.90
Prediction of the mutational effects on the structural stability. A negative value indicates destabilizing effect.

Molecular consequences of reported mutations in different pathways

Additionally, three mutations occurring in ORFIA were predicted to alter the molecular consequences, including Loss of Disulfide linkage, Loss of Catalytic site, Loss of Allosteric site, Gain of Proteolytic cleavage, and Gain of Sulfation. Similarly, one mutation each in Membrane Glycoprotein and ORF8 were predicted to alter transmembrane protein,altered DNA binding, as well as altered disordered interface respectively Table 3 .
Table 3

Prediction of effects of the mutation on different biological pathways.

Protein NameMutationEffects
Membrane GlycoproteinH125YAltered Transmembrane protein
Altered Ordered interface
ORF8W45LAltered Ordered interface
Altered Disordered interface
Altered DNA binding
Altered Transmembrane protein
ORF1AT2408IAltered Transmembrane protein
Gain of strand
Loss of Disulfide linkage at C2404
Gain of N-linked glycosylation at N2405
Loss of Catalytic site at C2404
Loss of GPI-anchor amidation at N2405
P2965LAltered Transmembrane protein
Altered Ordered interface
Loss of Relative solvent accessibility
Altered Metal binding
P3447SAltered Transmembrane protein
Altered Ordered interface
Loss of Allosteric Site at Y3445
Gain of Strand
Loss of Relative solvent accessibility
Gain of Proteolytic cleavage at D3450
Loss of Pyrrolidone carboxylic acid at Q3452
Gain of Sulfation at Y3445
ORF1ABL3606Loss of Helix
Gain of Loop
Altered Ordered interface
Gain of Sulfation at Y3607
Gain of Strand
Prediction of effects of the mutation on different biological pathways.

Modelling of the wild type and spike mutant structures

Continued mutations in SARS-CoV-2 Genome cause potential differences in its transmission, virulence, and pathogenesis compared to their wild counterparts. Recently reported mutations in the spike proteins of SARS-CoV-2 from South Africa (Lys417Asn, Glu484Lys, Asn501Tyr) (501Y.V2Variant) and Brazil (Lys417Asn, Glu484Lys, Asn501Tyr) had led these strains to evade the vaccines successfully [18]. Therefore, it was necessary to explore the mutations in the spike protein of Pakistani isolates. SARS-CoV-2 virus Spike proteins have an RBD domain, which aids in binding to ACE2 receptors on the host cell. This binding sets off a chain reaction of events that lead to fusing the host cell and viral membranes for cell penetration. SARS-spike CoV-2's RBD was submitted to comparative binding and biophysical study upon interaction with ACE2 because of its relevance in virulence and pathogenesis. RCSB was used to get the structure of the Spike and ACE2 proteins (PDB ID:6M0J). Docking the spike protein with the human ACE2 receptor, HDOCK online server was used. To verify whether the binding affinity of the RBD domain for the human ACE2 can be affected by the six mutations present in the spike protein of Pakistani isolates, we made a double mutant, which consists of the N74K-D614G, N74K-Q677H, D614G-V622F, D614G-S943F, D614G-V622F–S943F, and D614G-Q677H. Structural organization, domain separation, and binding interface of spike protein are shown in Fig. 3 a. The mutants were generated by Chimera and are given in Fig. 3b–e.
Fig. 3

Representation of spike glycoprotein structure (PDB ID:6M0J) and SARS-CoV-2 binding domain (a) Different domains of spike protein with different colors. The green color shows the RBD domain while the orange color shows the ACE2 receptor. The right-side figure shows the binding interface along with its stick representation of the key hydrogen interactions of ACE2 and the spike RBD. (b) wild type spike protein, (c) D614G, V622F, S943T (d) D614G, Q677H (e) N74K, D614G. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Representation of spike glycoprotein structure (PDB ID:6M0J) and SARS-CoV-2 binding domain (a) Different domains of spike protein with different colors. The green color shows the RBD domain while the orange color shows the ACE2 receptor. The right-side figure shows the binding interface along with its stick representation of the key hydrogen interactions of ACE2 and the spike RBD. (b) wild type spike protein, (c) D614G, V622F, S943T (d) D614G, Q677H (e) N74K, D614G. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Docking of the wild type and mutant spike structures

Keeping in mind the involvement of spike protein in a particular process, including attachment and pathogenesis, we investigated the critical analysis of both wild and mutant versions of the spike protein (Spike RBD) with ACE2 receptor. Structural study of protein interactions and their binding energy is an important initial step in biological process knowledge. Binding affinity, which defines whether a complex is formed under specific conditions, is essential to regulate molecular interactions, create new treatments (i.e. directing rational drug design) and anticipate the impact variation on the protein interfaces [68]. Prior to docking of Spike and ACE2 superimposition of the mutants was performed to check the structural variations caused by the mutations. The generated mutants were superimposed on wild-type spike protein, and the RMSD values were recorded (Fig. 4 a, b, c).
Fig. 4

(a, b, c) Superimposed structure of wild-type spike protein (green) with D614G-V622F–S943F (red), and D614G-Q677H (orange) and. N74K-D614 (blue). The RMSD of each superimposition was reported to be 3.1 Å, 3.5 Å, and 2.99 Å. (d) Docking complex of wild-type spike protein with ACE2. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

(a, b, c) Superimposed structure of wild-type spike protein (green) with D614G-V622F–S943F (red), and D614G-Q677H (orange) and. N74K-D614 (blue). The RMSD of each superimposition was reported to be 3.1 Å, 3.5 Å, and 2.99 Å. (d) Docking complex of wild-type spike protein with ACE2. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) Therefore, we used HDOCK to perform the docking of ACE2 with the wild-type and double mutant spike protein (D614G-V622F–S943F D614G-Q677H and N74K-D614G) to explain how these mutations led to the higher infectivity of SARS-CoV-2 variants. HDOCK resulted in a docking score of −12.40 kcal/mol for the ACE2-wild type spike complex. PDBsum interaction study revealed that the interface is made up of 37 residues. 18 residues are supplied by ACE2 and 15 by spike-RBD, which are responsible for interface formation. One salt bridge, 11 hydrogen bonds, and 125 non-bonded interactions were found at the interface. The hydrogen bonds formed by the ACE2-spike (RBD) wild include Gln493-Glu35, Gln493-Lys31, Lys417-Asp30, Gly476-Glu23, Tyr505-Glu37, Gly502-Lys353, Gly496- Lys353, Gln498- Lys353, and double hydrogen bond between Thr500-Tyr41(Fig. 4d). One salt bridge was reported between Asp30 (ACE2) and Lys417 (spike RBD). The HDOCK docking score for D614G-V622F–S943F (ACE2-spike RBD) was reported as −13.70 kcal/mol. This distinction between wild-type and mutant was investigated further by examining the molecular interactions. The interaction analysis showed two salt bridges and an increased number of hydrogen bonds (12 hydrogen bonds) in mutant form. The enhanced interaction is due to substituted residue D614G, V622F, S943F in the spike protein complex (Fig. 5 a). The key residues Lys31-Glu484 and Asp30-Lys417 established the salt bridges. Among the hydrogen bonds Thr747-Asp609, Lys417-His195, Glu406-His195, Thr415-Arg195, Gln755-His493, Asn501-Gln86, Gln498-Gln86, Tyr489-Ser77, Tyr489-Lys74, and Arg457-Ser109 residues are involved. As previously stated, the D614G-V622F–S943F mutation is responsible for enhancing the dissemination and infectivity of the SARS-CoV-2 variant and increasing the binding affinity and infectivity. The HDOCK docking score for D614G, Q677H (ACE2-spike RBD), was −13.42 kcal/mol. The difference between the HDOCK score of wild-type and mutant is explained in terms of molecular interactions. Among the hydrogen bonds Gly476-Ser19, Gly476-Glu23, Ser477-Gln24, Tyr489-Tyr83, Tyr449-Asp38, Gln493-Glu35, Thr500-Tyr41, Gln489-Lys353, Gly502-Lys353, and Tyr505-Glu37 residues are involved (Fig. 5b). As a result, the D614G, Q677H double mutant enhanced binding efficiency, and this mutant, as well as other interface residues, are important hotspots for therapeutic development against SARS-CoV-2 variants.
Fig. 5

Docking complexes of D614G-V622F–S943F spike protein with ACE2. (b) Docking complexes of D614G-Q677H spike protein with ACE2.

Docking complexes of D614G-V622F–S943F spike protein with ACE2. (b) Docking complexes of D614G-Q677H spike protein with ACE2. The predicted score of HDOCK for the ACE2-N74K, D614G mutant spike protein was −12.50 which is comparable with the wild-type complex. According to a PDBsum study of the complex, the interface is formed by 37 residues, 18 of which are supplied by ACE2 and 15 by the spike-RBD. According to the interaction study, both structures create 1 salt bridge, 11 hydrogen bonds, and 125 non-bonded interactions. The hydrogen bonds formed by the ACE2- N74K, D614G spike (RBD) include Gln493-Glu35, Gln493-Lys31, Tyr453-His34, Lys417-Asp30, Thr500-Tyr41, Tyr449-Asp38, Tyr505-Glu37, Gly502-Lys353, Tyr449- Lys353, Gln498-Lys353, and double hydrogen bond between Thr500-Tyr41(Fig. 6 ). In this complex, most of the hydrogen binding pattern is similar to the wild-type complex.
Fig. 6

Docking complexes of ACE2-N74K, D614G spike protein with ACE2. The right panel shows the 2D interaction pattern generated with PDBsum.

Docking complexes of ACE2-N74K, D614G spike protein with ACE2. The right panel shows the 2D interaction pattern generated with PDBsum.

Estimation of KD

Estimation of the KD revealed that D614G-S943T-V622F (2.0 E−10) variant binds stronger than all. Moreover, the strength of the wild type and mutants was ranked as D614G-Q677H variant (1.0 E−09), wild type (1.9 E−09) while for the N74K-D614G variant the predicted KD value was 2.2 E−09. Conclusively this shows that the two mutants possess stronger affinity than the wild type which might be responsible for the increased cases in Pakistan. The docking score and the predicted KD values are given in Table 4 .
Table 4

The predicted docking scores and KD values for each complex.

ComplexΔG (kcal mol−1)KD (M) at 38.0 °C
Wild type−12.401.9E−09
D614G-S943T-V622F Variant−13.422.0E−10
D614G-Q677H Variant−13.701.0E−09
N74K-D614G Variant−12.502.2E−09
The predicted docking scores and KD values for each complex.

Dynamic stability and flexibility analysis

To understand the dynamic behavior of the wild type and these variants, MD simulation of each complex was performed. The stability of each complex was assessed as root mean square deviation (RMSD) and residual flexibility as root mean square fluctuation. The RMSD graphs shown in Fig. 7 A-C demonstrate that the wild type exhibit more stable dynamics than the variants. From Fig. 7A, it can be seen that the wild type complex attained the equilibrium at 25ns and reached stability at 2.0 Å. On the other hand, the D614G-S943T-V622F variant complex demonstrated unstable dynamics at different times interval thus corroborated with the servers’ predicted results. Unlike the D614G-S943T-V622F variant, the other two variants i.e., D614G-Q677H and N74K-D614G reported a significantly destabilizing effect throughout the simulation. From Fig. 7B and C, it can be seen that these two complexes demonstrated significant structural perturbation and hence reports uniform results with the aforementioned results predicted by different servers.
Fig. 7

Dynamic stability analysis of the wild type and mutant complexes. (A) RMSD of the wild type. (B) RMSD of D614G-S943T-V622F, (C) RMSD of D614G-Q677H and (D) RMSD N74K-D614G variant complex.

Dynamic stability analysis of the wild type and mutant complexes. (A) RMSD of the wild type. (B) RMSD of D614G-S943T-V622F, (C) RMSD of D614G-Q677H and (D) RMSD N74K-D614G variant complex. On the other hand, the RMSF results demonstrated a differential flexibility index for each complex thus showing the impact of these mutations on the structural-dynamic features. The RMSF results of each complex are shown in Fig. 8 .
Fig. 8

Residual flexibility of the wild type, D614G-S943T-V622F, D614G-Q677H, and RMSD N74K-D614G variant complexes.

Residual flexibility of the wild type, D614G-S943T-V622F, D614G-Q677H, and RMSD N74K-D614G variant complexes. To connect protein conformational changes with the binding, induced by mutations in the proteins, we used the MM/GBSA technique to calculate the binding energy of spike RBD to hACE2. The binding comparison revealed significant variation in the spike RBD binding between the wild type, D614G-V622F–S943F, and D614G-Q677H double mutants. As given in Table 5 , the total binding energy for the D614G-Q677H variant was −75.78 kcal/mol followed by the D614G-S943T-V622F variant −75.17 kcal/mol and −73.84 kcal/mol for the N74K-D614G variant. The total binding energy for the wild type was −66.34 kcal/mol. The vdW results reported a comparable result however the electrostatic energies are reported to be higher in the case of the variants which strongly corroborate with the previous papers which reported an increment in the electrostatic energy by variants. The notion of electrostatic energy has been a key factor in the enhanced binding [20,59,60,69]. In sum, the polar solvation energy supplies a non-favorable contribution by comparing with non-polar which reported a good contribution. The dynamic's environment's charge density seems to engage with itself infrequently, explaining the non-significant effect in binding. In brief, the solvation energy of variants adhering to ACE2 is unfavorable, whilst gas-phase energy is critical for the variants' strong binding to ACE2. Decisively this shows that the re-ranking of the complex using MM/GBSA validated that the mutants possess a stronger affinity towards ACE2 than the wild type thus enhancing the transmission and infectivity. The MM/GBSA results including vdW (van der Waal), electrostatic (ELE), GB (generalized Born), SA (Surface Area), and the total binding energy are given in Table 5.
Table 5

MM/GBSA analysis of the wild type and mutant complexes. All the results are calculated in kcal/mol.

ComplexVDWELEGBSATOTAL
Wild Type−108.39−659.94716.45−14.46−66.34
D614G-S943T-V622F Variant−114.64−697.17751.1−14.46−75.17
D614G-Q677H Variant−109.08−700.02747.87−14.55−75.78
N74K-D614G Variant−109.81−687.13737.56−14.46−73.84
MM/GBSA analysis of the wild type and mutant complexes. All the results are calculated in kcal/mol.

Conclusions

In summary, this study used genome-based analysis and molecular modeling approaches to determine a basis for the increased infection and transmission of SARS-CoV-2 in Pakistan. The data revealed that the D614G mutation particularly contributes to the enhanced infectivity. A significant bonding reprogramming was identified, and the structural dynamic features strongly corroborated the predicted results. However, such polymorphic sites should be further studied to explore the evolutionary paradigm and any emerging variants. This work provides a strong impetus for the development of novel drugs against the strains circulating in Pakistan and around the world.

Data availability

All the data is available on RCSB, UniProt, and any simulation data would be provided on reasonable demand. The accession numbers to access this data are given in the manuscript.

Ethical approval

Not Applicable.

Declaration of competing interest

Declared None.
  64 in total

1.  SARS-CoV-2 spike D614G change enhances replication and transmission.

Authors:  Bin Zhou; Tran Thi Nhu Thao; Donata Hoffmann; Adriano Taddeo; Nadine Ebert; Fabien Labroussaa; Anne Pohlmann; Jacqueline King; Silvio Steiner; Jenna N Kelly; Jasmine Portmann; Nico Joel Halwe; Lorenz Ulrich; Bettina Salome Trüeb; Xiaoyu Fan; Bernd Hoffmann; Li Wang; Lisa Thomann; Xudong Lin; Hanspeter Stalder; Berta Pozzi; Simone de Brot; Nannan Jiang; Dan Cui; Jaber Hossain; Malania M Wilson; Matthew W Keller; Thomas J Stark; John R Barnes; Ronald Dijkman; Joerg Jores; Charaf Benarafa; David E Wentworth; Volker Thiel; Martin Beer
Journal:  Nature       Date:  2021-02-26       Impact factor: 49.962

2.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information.

Authors:  Peter W Rose; Andreas Prlić; Ali Altunkaya; Chunxiao Bi; Anthony R Bradley; Cole H Christie; Luigi Di Costanzo; Jose M Duarte; Shuchismita Dutta; Zukang Feng; Rachel Kramer Green; David S Goodsell; Brian Hudson; Tara Kalro; Robert Lowe; Ezra Peisach; Christopher Randle; Alexander S Rose; Chenghua Shao; Yi-Ping Tao; Yana Valasatava; Maria Voigt; John D Westbrook; Jesse Woo; Huangwang Yang; Jasmine Y Young; Christine Zardecki; Helen M Berman; Stephen K Burley
Journal:  Nucleic Acids Res       Date:  2016-10-27       Impact factor: 16.971

3.  Clinical Characteristics of Coronavirus Disease 2019 in China.

Authors:  Wei-Jie Guan; Zheng-Yi Ni; Yu Hu; Wen-Hua Liang; Chun-Quan Ou; Jian-Xing He; Lei Liu; Hong Shan; Chun-Liang Lei; David S C Hui; Bin Du; Lan-Juan Li; Guang Zeng; Kwok-Yung Yuen; Ru-Chong Chen; Chun-Li Tang; Tao Wang; Ping-Yan Chen; Jie Xiang; Shi-Yue Li; Jin-Lin Wang; Zi-Jing Liang; Yi-Xiang Peng; Li Wei; Yong Liu; Ya-Hua Hu; Peng Peng; Jian-Ming Wang; Ji-Yang Liu; Zhong Chen; Gang Li; Zhi-Jian Zheng; Shao-Qin Qiu; Jie Luo; Chang-Jiang Ye; Shao-Yong Zhu; Nan-Shan Zhong
Journal:  N Engl J Med       Date:  2020-02-28       Impact factor: 91.245

4.  In-silico evaluation of bioactive compounds from tea as potential SARS-CoV-2 nonstructural protein 16 inhibitors.

Authors:  Rahul Singh; Vijay Kumar Bhardwaj; Jatin Sharma; Rituraj Purohit; Sanjay Kumar
Journal:  J Tradit Complement Med       Date:  2021-06-03

5.  Dynamics of the ACE2-SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms.

Authors:  Amanat Ali; Ranjit Vijayan
Journal:  Sci Rep       Date:  2020-08-26       Impact factor: 4.379

6.  Structural basis of receptor recognition by SARS-CoV-2.

Authors:  Jian Shang; Gang Ye; Ke Shi; Yushun Wan; Chuming Luo; Hideki Aihara; Qibin Geng; Ashley Auerbach; Fang Li
Journal:  Nature       Date:  2020-03-30       Impact factor: 49.962

7.  Evaluation of acridinedione analogs as potential SARS-CoV-2 main protease inhibitors and their comparison with repurposed anti-viral drugs.

Authors:  Vijay Kumar Bhardwaj; Rahul Singh; Pralay Das; Rituraj Purohit
Journal:  Comput Biol Med       Date:  2020-11-12       Impact factor: 4.589

8.  Structural insights into the mechanism of RNA recognition by the N-terminal RNA-binding domain of the SARS-CoV-2 nucleocapsid phosphoprotein.

Authors:  Abbas Khan; Muhammad Tahir Khan; Shoaib Saleem; Muhammad Junaid; Arif Ali; Syed Shujait Ali; Mazhar Khan; Dong-Qing Wei
Journal:  Comput Struct Biotechnol J       Date:  2020-08-12       Impact factor: 7.271

View more
  4 in total

Review 1.  Molecular characteristics, immune evasion, and impact of SARS-CoV-2 variants.

Authors:  Cong Sun; Chu Xie; Guo-Long Bu; Lan-Yi Zhong; Mu-Sheng Zeng
Journal:  Signal Transduct Target Ther       Date:  2022-06-28

Review 2.  Roles and functions of SARS-CoV-2 proteins in host immune evasion.

Authors:  Farooq Rashid; Zhixun Xie; Muhammad Suleman; Abdullah Shah; Suliman Khan; Sisi Luo
Journal:  Front Immunol       Date:  2022-08-08       Impact factor: 8.786

3.  Digging for the discovery of SARS-CoV-2 nsp12 inhibitors: a pharmacophore-based and molecular dynamics simulation study.

Authors:  Fatemeh Sana Askari; Mohsen Ebrahimi; Jabbar Parhiz; Mina Hassanpour; Alireza Mohebbi; Abbas Mirshafiey
Journal:  Future Virol       Date:  2022-08-08       Impact factor: 3.015

4.  The First Geographic Identification by Country of Sustainable Mutations of SARS-COV2 Sequence Samples: Worldwide Natural Selection Trends.

Authors:  Mohammadamin Mahmanzar; Seyed Taleb Houseini; Karim Rahimian; Arsham Mikaeili Namini; Amir Gholamzad; Samaneh Tokhanbigli; Mahsa Mollapour Sisakht; Amin Farhadi; Donna Lee Kuehu; Youping Deng
Journal:  bioRxiv       Date:  2022-07-19
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.