SARS-CoV-2 transmissibility is higher than that of other human coronaviruses; therefore, it poses a threat to the populated communities. We investigated mutations among envelope (E), membrane (M), and spike (S) proteins from different isolates of SARS-CoV-2 and plausible signaling influenced by mutated virus in a host. We procured updated protein sequences from the NCBI virus database. Mutations were analyzed in the retrieved sequences of the viral proteins through multiple sequence alignment. Additionally, the data was subjected to ScanPROSITE to analyse if the mutations generated a relevant sequence for host signaling. Unique mutations in E, M, and S proteins resulted in modification sites like PKC phosphorylation and N-myristoylation sites. Based on structural analysis, our study revealed that the D614G mutation in the S protein diminished the interaction with T859 and K854 of adjacent chains. Moreover, the S protein of SARS-CoV-2 consists of an Arg-Gly-Asp (RGD) tripeptide sequence, which could potentially interact with various members of integrin family receptors. RGD sequence in S protein might aid in the initial virus attachment. We speculated crucial host pathways which the mutated isolates of SARS-CoV-2 may alter like PKC, Src, and integrin mediated signaling pathways. PKC signaling is known to influence the caveosome/raft pathway which is critical for virus entry. Additionally, the myristoylated proteins might activate NF-κB, a master molecule of inflammation. Thus the mutations may contribute to the disease pathogenesis and distinct lung pathophysiological changes. Further the frequently occurring mutations in the protein can be studied for possible therapeutic interventions.
SARS-CoV-2 transmissibility is higher than that of other human coronaviruses; therefore, it poses a threat to the populated communities. We investigated mutations among envelope (E), membrane (M), and spike (S) proteins from different isolates of SARS-CoV-2 and plausible signaling influenced by mutated virus in a host. We procured updated protein sequences from theNCBI virus database. Mutations were analyzed in the retrieved sequences of the viral proteins through multiple sequence alignment. Additionally, the data was subjected to ScanPROSITE to analyse if the mutations generated a relevant sequence for host signaling. Unique mutations in E, M, and S proteins resulted in modification sites likePKC phosphorylation and N-myristoylation sites. Based on structural analysis, our study revealed that theD614G mutation in the S protein diminished the interaction with T859 and K854 of adjacent chains. Moreover, the S protein of SARS-CoV-2 consists of an Arg-Gly-Asp (RGD) tripeptide sequence, which could potentially interact with various members of integrin family receptors. RGD sequence in S protein might aid in the initial virus attachment. We speculated crucial host pathways which the mutated isolates of SARS-CoV-2 may alter likePKC, Src, and integrin mediated signaling pathways. PKC signaling is known to influence the caveosome/raft pathway which is critical for virus entry. Additionally, the myristoylated proteins might activateNF-κB, a master molecule of inflammation. Thus the mutations may contribute to the disease pathogenesis and distinct lung pathophysiological changes. Further the frequently occurring mutations in the protein can be studied for possible therapeutic interventions.
In December 2019, a series of pneumonia cases with clinical symptoms much resembling a viral infectionemerged at the Wuhan city of Hubei province, China, with no clear indication of the cause [1, 2]. A while after, the whole genome sequencing of respiratory tract samples from theinfected individuals, indicated an unfamiliar beta-coronavirus, which was initially called the2019 novel coronavirus (2019-nCoV) [1, 3]. On further phylogenetic investigation the International Committee on taxonomy of viruses renamed the virus as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) [4]. Moreover, a recently published data revealed that thehumanSARS-CoV-2 shared the highest nucleotide sequence similarity with the bat CoV: RaTG13 [5]. The illness caused by SARS-CoV-2 is referred to as thecoronavirus disease of 2019 (COVID-19) [6,7]. Recognized as a global health emergency, COVID-19 was declared a pandemic on 11th March 2020 by the World Health Organisation (WHO). Since then, the virus has spread to more than 223 countries and territories [8]. As on 27th of January there are 98,280,844 total new SARS-CoV-2 cases, and about 2,112,759 deaths were reported worldwide [8]. Since then researchers are relentlessly trying to address this problem of high transmissibility of the virus from one person to another [9].Importantly theSARS-CoV-2 proteins that help these viruses to infect the host and propagate inside the host cells include thespike (S), membrane (M) and envelope (E) proteins [10]. Viral attachment is the first step towards establishment of a plausible successful infection in the host. Several in silico studies have demonstrated that SARS-CoV-2 utilises its S protein to bind to theangiotensin-converting enzyme 2 (ACE-2) receptor and the outcome was further strengthened by various in vitro and in vivo studies [11, 12]. Yet another in silico study revealed the key residues involved in establishing the stable interaction between S and ACE-2 proteins [13]. The information could be useful in determining the binding affinity of the viral protein to different allelic variants of ACE-2 found in COVID-19patients. Also, theestablishment of stable interaction between the viral and host proteins is as important as their binding. A computational study identified mutations in the S protein that increased the affinity of the virus towards humanACE-2 thus maintaining stable connections [14]. Furthermore, scrutiny of drugs from the available databases can aid in procurement of FDA approved therapeutics. Several antivirals have already been investigated for the treatment of SARS-CoV-2 infection [15]. Computation docking studies have been conducted to target 3C like protease (3CLpro/Mpro). Drugs likeRemdesivir, zanamivir, saquinavir and indinavir are known to potentially target the protease (PDB ID: 6LU7) [16]. Further a docking study concluded that chloroquine preferred binding to the S protein rather than the Mpro [16, 17]. An in depth molecular docking and dynamic simulation study demonstrated that a protein inhibitor ΔABP-D25Yexclusively bound to the S protein ACE-2 binding site, thus the inhibitor could be a potential blocker of S protein and receptor binding domain (RBD) attachment [16, 17, 18].Numerous imical">nvestigations haveelucidated similarities and differences among the genomes and proteins of different humanCoVs using various bioinformatics tools [19]. These studies have helped in understanding the relatedness of the virus with other humanCoVs. In the current study we took advantage of the available database containing the information of SARS-CoV-2 proteins and focused on the analysis of various mutations in these proteins from different isolates from multiple geographical regions. Knowing the mutations in various proteins of SARS-CoV-2 may unravel the mystery of higher transmission rates of COVID-19 leading to such a pandemic and might provide a helping hand to target the virus specifically. Studies have also shown that often the drugs used to target the virus may not beequally effective in all theinfected individuals [20]. The reason may be the varying host conditions like the immune status, age, gender etc., however it may also be due to variants of the virus that infect the individuals. Thus, variants of SARS-CoV-2 may influence the host differently. Therapeutics targeting a specific variant or a combination of drugs effective against a broad number of viral isolates may be prepared upon having knowledge regarding the mutation in the viruses. In addition, our study will help in theevaluation of theeffects of essential mutations in SARS-CoV-2 on the host.
Besides we investigated relevant mutations in the sequences of SARS-CoV-2 structural proteins. We observed that single amino acid substitutions generated unique sites like myristoylation and protein kinase C (PKC) phosphorylation sites. Notably, some modifications in viral proteins are vital to the viruses life cycle, for example, myristoylation and phosphorylation site of S and M protein respectively, enhance the arenavirus and human immunodeficiency virus (HIV) entry into the host cell and further help in the replication and the hijacking of the host cell machinery [21, 22]. Phosphorylation of membrane protein at PKC sites mediates budding, pathogenesis, overall assembly of viral particles, and viral envelope formation in Herpes simplex virus (HSV), HIV, and influenza virus [23, 24, 25]. Various recently published data haveestablished mutation in the RBD of SARS-CoV-2 S protein, which contributed to theenhanced infectivity of the virus [26, 27]. Briefly, the S protein of SARS-CoV-2, a trimeric multifunctional molecule, has two subunits S1 and S2, while the S1 subunit helps to bind with the host receptor, the S2 subunit provides entry into the host cells through the fusion process [26, 28, 29]. Moreover, the S1 subunit has two domains that recognizeACE-2 and ACE-1, and integrin molecules [30, 31, 32]. It is noteworthy to mention that we have identified a conserved Arg-Gly-Asp (RGD) sequence in S protein of SARS-CoV-2 capable of interacting with an array of integrin family receptors [33]. This could contribute to theenhanced ability of the virus to transmit from one person to another. Conclusively, mutations, and modifications in protein sequences could explain the virus adaptability or infectivity pattern [34]. It will also help us understand the host response to the virus and thereby cater to theinfection in a much better way. The interactions may alter the host microenvironment and intracellular signaling triggering the inflammatory pathways. The alterations brought in by the virus may contribute to theCOVID-19 associated disease pathogenesis.
Materials and methods
Sequence retrieval and multiple sequence alignment (MSA)
Amino acid sequences of three proteins, namely E, M, and S protein from different isolates of SARS-CoV-2, were retrieved from theNCBI Virus database [35]. The sequences generated from 24th January 2020 to 17th July 2020 were utilized. The isolates with partial or missing amino acid sequences for respective proteins wereexcluded. Overall for E, M, and S proteins, the sequences from 9463, 6970, and 6476 isolates were considered respectively. Multiple sequence alignments (MSA) of amino acid sequences of all three proteins were performed separately in MEGA X software using clustalW algorithm [36].
Protein motif and domain/modification site analysis
Proteimical">n motifs amical">nd modificatiomical">n sites of E, M, and S proteins were analyzed using the online public database tool ScanPROSITE [37, 38].
Structure retrieval and processing
The crystal structure of theSARS-CoV-2 S protein RBD bound to theACE-2 receptor (PDB: 6M0J) and the crystal structure of trimeric S protein with a single RBD in the up conformation (PDB: 6VSB) were retrieved from the RCSB Protein Data Bank (PDB). The amino acid sequences of RBD in 6M0J and 6VSB structures were identical. Hence utilized in the docking studies with ACE-2 or integrin molecules as per requirement. The missing amino acid residues, domains, and disulfide bonds in the S protein structure were resolved as per previously reported protocols [39, 40]. The RBD from the 6M0J structure (ACE: RBD assemble) was separated and edited using Chimera 1.14 to create a mutation in the protein structure. Three separate structures of RBD with a point mutation, each namely S477N, V483A, and N501Y were created. Similarly, the protein sequence of 6VSB was edited at 614 position to createD614G mutation. Further, all the proteins were prepared with the protein preparation wizard in MAESTRO v10.6 [41]. During the protein preparation process, hydrogen atoms and the bond orders were assigned. Further formal charges were added to hetero groups in the amino acid residues. Thehydrogen-bonding network in amino acid residues was optimized by selecting thehistidine tautomers and predicting the ionization states. The optimized protein structure was then subjected to all-atom constrained energy minimization using MAESTRO v10.6 with the OPLS3 force field.
Molecular docking and interaction analysis
We performed the molecular docking of ACE-2 with wild type or mutated RBD structures using the HDOCK server [42, 43]. For ACE-2: RBD, a template free and targeted docking was performed considering 24–42, 79–83, and 353–357 residue range of ACE-2 while 446–456 and 486–505 residue range of RBD predicted to participate in the interaction [44]. The docked structures with the lowest root mean square deviation (RMSD) and docking scores were considered for further study. RMSD is the average distance between the atoms of superimposed proteins (with and without bound ligand) and provides information if the two conformers are intrinsically similar or not. Similarly, a template free targeted docking of theSARS-CoV-2 S protein (6VSB) was performed with various integrin molecules using the HDOCK server. Top 10 models proposed by the software with minimum RMSD and docking score values were analyzed for the interaction of the 403–405 RGDtripeptide sequence in the S protein with theintegrin molecules. The protein-protein interactions in the selected docked structures/models were inferred using interacting residue plots generated by the DIMPLOT tool of Ligplot + v.2.2 [45,46].
Study of plausible host cell signaling pathway influenced by mutated virus proteins
The information regarding the host cell response for shortlisted protein modification sites obtained as a result of mutations in various virus proteins was acquired through existing literature [47, 48, 49]. The schematic representation and elucidation of possible signaling triggered due to mutated virus proteins were interpreted (Figure 9).
Figure 9
Possible signaling pathways utilized by SARS-CoV-2 for cellular hijacking. a) SARS-CoV-2 may bind to the angiotensin-converting enzyme-2 (ACE-2) receptor and enter the cell through CME mediated pathway. b) Caveolae/raft dependent endocytosis: SARS-CoV-2 may function by attaching to the higher cholesterol regions of the cell membrane. It may facilitate invagination of the cell membrane, and viral complex. c) SARS-CoV-2 may also bind to the integrin through the RGD tripeptide domain and activate the downstream signaling complexes. The activated complex may thus influence the RAS-related C3 botulinum toxin substrate-1 (RAC-1), Mitogen-activated protein kinase kinase (MEK), and interferon regulatory transcription factor (IRF3). d) SARS-CoV-2 may also bind to ACE-2 receptors and activate the downstream Gα subunit of ACE-2 receptors and transcription factor NF-κB leading to transcription of various inflammatory genes.
Results
Multiple sequence alignment of SARS-CoV-2 E, M and S proteins revealed unique mutations
We compared three proteins- E, M, and S of various SARS-CoV-2 isolates through MSA. Total 42, 156, and 5449 isolates showed at least one mutation in theE, M, and S protein sequences respectively. Each variation and its proportion are displayed in Figure 1 A-D. A significant mutation observed in nine sequences of E protein was P71L. However, the substitution of the amino acid resulted in a similar type of amino acid, i.e., a nonpolar amino acid. Further, a polar to nonpolar amino acid mutation S68F was observed in five isolates of E protein. L73F mutation, a nonpolar to nonpolar amino acid substitution occurred in four isolates. Remaining one or two mutations per sequence were seen in less than 20 isolates. In M protein, D209Y, T175M, and K15R mutations with all polar to polar amino acid substitution were seen in 20, 19, and 18 isolates respectively. Additionally, mutations that resulted in similar kinds of amino acids wereV70F (nonpolar to nonpolar) and H125Y (polar to polar) obtained in more than ten isolates. Nonetheless, mutation with a change in polar to nonpolar amino acid was also observed, D3G, in various isolates. Other modifications were prevalent in less than ten isolates. Compared to E and M proteins, many isolates showed variations in the S protein sequences. The most prominent mutation D614G was observed in 4555 isolates, out of 6476 total isolates. The mutation resulted in the replacement of a hydrophilic and negatively charged D to hydrophobic and neutral G. The second and third prominent mutation was nonpolar aliphatic to an aromatic amino acid, L5F and L54F, each observed in 68 isolates. TheS477N (polar to polar) and V483A (nonpolar to nonpolar) mutation was observed in 39 and 22 isolates. Interestingly, these mutations were observed in the RBD of the S protein. All other mutations were found in less than 20 isolates. In addition, we have shown the appearance of particular mutations in E, M and S proteins over time along with the location in which these isolates emerged (Figure 2). Theexact date and month of identification of theseSARS-CoV-2 mutants is given in table S1.
Figure 1
Pie diagram representing the proportions of mutations in a) E protein b) M protein c) S protein and d) RBD region of the S protein. For each protein, the total isolates with at least one mutation considered here are 42, 156 and 5449 for E, M, and S protein respectively.
Figure 2
Timeline for mutations in a) E protein b) M protein and c) S protein, appearing for the first time along with country where the mutation was first reported. The top mutations observed in each of the mentioned protein are considered in timeline preparation.
Pie diagram rn class="Gene">epresenting the proportions of mutations in a) E protein b) M protein c) S protein and d) RBD region of the S protein. For each protein, the total isolates with at least one mutation considered here are 42, 156 and 5449 for E, M, and S protein respectively.
Timeline for mutations in a) E protein b) M protein and c) S protein, appearing for the first time along with country where the mutation was first reported. The top mutations observed in each of the mentioned protein are considered in timeline preparation.
Mutations in E, M and S protein of SARS-CoV-2 resulted in unique sequences recognised by the host factors
Glutamic acid (E) to lysine (K) mutation forms a PKC phosphorylation site in E protein of SARS-CoV-2
We identified a single mutation, E8K, in an isolate QKO24093.1 from the USA in E protein of SARS-CoV-2 (Figure 3a). The mutation resulted in a PKC phosphorylation site unique to the sequence. Other domains observed in all the protein sequences under investigation included Nglycosylation sites at 48–51 and 66–69 positions, and a PKC phosphorylation site at 67–69 position (Figure 3b).
Figure 3
Schematic representation of modification sites in the envelope (E) and membrane (M) protein sequences of SARS-CoV-2 generated due to amino acid substitution. a) In wild type E protein, two N glycosylation sites at 48–52 and 66–69 positions were present. b) Single amino acid substitution E8K in the E protein (QKO24093.1 isolate) resulted in a PKC site at position 6–9 position. c) Wild type M protein contained an N glycosylation site at 5–8, two casein kinase II (CK II) phosphorylation sites at 9–12 and 212–215, two N-myristoylation sites at 79–84 and 126–131, and three PKC phosphorylation sites at 99–101 172-174 and 184–186 positions. d) In addition to the above mentioned modification sites in the M protein, a single point mutation D3G in QIZ16332.1 isolate resulted in a N myristoylation site at 3–8 position. e) Similarly, a single amino acid substitution V10A in the M protein sequence of QLF97810.1 isolate resulted in an N myristoylation site at 6–11 position. f) Furthermore a CK II phosphorylation site was observed at 9–12 position due to a single amino acid substitution P132S in the sequence of QKG90089.1 isolate.
Schematic rn class="Gene">epresentation of modification sites in theenvelope (E) and membrane (M) protein sequences of SARS-CoV-2 generated due to amino acid substitution. a) In wild typeE protein, two Nglycosylation sites at 48–52 and 66–69 positions were present. b) Single amino acid substitution E8K in theE protein (QKO24093.1 isolate) resulted in a PKC site at position 6–9 position. c) Wild type M protein contained an Nglycosylation site at 5–8, two casein kinase II (CK II) phosphorylation sites at 9–12 and 212–215, two N-myristoylation sites at 79–84 and 126–131, and threePKC phosphorylation sites at 99–101 172-174 and 184–186 positions. d) In addition to the above mentioned modification sites in the M protein, a single point mutation D3G in QIZ16332.1 isolate resulted in a N myristoylation site at 3–8 position. e) Similarly, a single amino acid substitution V10A in the M protein sequence of QLF97810.1 isolate resulted in an N myristoylation site at 6–11 position. f) Furthermore a CK II phosphorylation site was observed at 9–12 position due to a single amino acid substitution P132S in the sequence of QKG90089.1 isolate.
Unique mutations in M protein result in N-myristoylation sites
The D3G mutation was observed in 12 isolates of M protein resulting in N-myristoylation site at 3–8 position (Figure 3d). In addition, a single amino acid substitution V10A in isolate QLF97810.1 at the 10th position of the M protein sequence resulted in N-myristoylation site (Figure 3e). Also, P132S mutation caused the occurrence of a casein kinase II phosphorylation site (Figure 3f). Additionally, all the sequences under consideration included modifications sites namely, N-glycosylation site (NGTI; 5–8 amino acids), casein kinase II phosphorylation site (TVVE; 9–12 and SSSD; 212–215 amino acids), N-myristoylation site (GIAIAM; 79–84 and GTILTR; 126–131), and protein kinase C phosphorylation site (SFR; 99–101, TSR; 172–174 and SQR; 184–186 amino acids) (Figure 3c).
Presence of RGD sequence in S protein and D614G mutation resulted in an additional N-myristoylation site in S protein of SARS-CoV-2
In this study, we procured 6476 S protein sequences from different countries. To name a few, we included sequences from Australia, Brazil, China, India, Italy, Japan, Pakistan, South Korea, Spain, Sweden, Taiwan, USA, Finland, Vietnam, etc. The 1273 aa long S protein sequences of theSARS-CoV-2 were subjected to functional domain analysis using ScanPROSITE. The protein was observed to have a cysteine-rich profile (score = 14.622), with thecysteine-rich region lying between 1235–1254. We further obtained various frequently occurring post translation modification (PTM) sites like theN-glycosylation, PKC phosphorylation, casein kinase II phosphorylation, N-myristoylation, and cAMP cGMP-dependent protein kinase phosphorylation sites along with an RGDtripeptide region. 84% of the mutated S protein sequences possessed D614G mutation. Interestingly this mutation contributed to an additional N-myristoylation site (614–619). Nonetheless, we identified theRGD site at 403–405 amino acid position in all the isolated protein sequences. TheRGDtripeptide sequence obtained in all the S protein sequences might interact with the host integrin and enhance the virus's ability to infect. The virus has, therefore, evolved itself to manipulate the host differently. Various mutations away from the modification sites were also observed.
Impact of D614G mutation on its intramolecular interaction
In the wild typn class="Gene">e trimeric structure of S protein, theAspartate residue at 614 position of A chain with an up RBD confirmation formed a hydrogen bond with T859. Also, it created a salt bridge with K854 of the adjacent chain, chain B (Figure 4a). However, these interactions were not seen in the mutated structure (D614G mutation) of the S protein (Figure 4b). The lack of onsiteacetyl group in the mutated structure prevented such interactions.
Figure 4
Intramolecular interactions of a) D614 (wild type) and b) G614 (mutant) residue of chain A with chain B residues of the S protein. D614 amino acid in the chain A of SARS-CoV-2 S protein (with up-RBD confirmation), interacted with T869 and K854 of adjacent chains while in the D614G mutant, G614 in chain A did not establish interaction with any nearby residue.
Intramolecular imical">nteractions of a) D614 (wild type) and b) G614 (mutant) residue of chain A with chain B residues of the S protein. D614 amino acid in the chain A of SARS-CoV-2 S protein (with up-RBD confirmation), interacted with T869 and K854 of adjacent chains while in theD614G mutant, G614 in chain A did not establish interaction with any nearby residue.
Molecular interactions between ACE-2 and wild type or mutated RBD
Different H-bond and hydrophobic interactions betweenACE-2 and wild type or mutated RBD were analyzed using the DIMPLOT tool of Ligplot + v.2.2 and were displayed in Figures 5 and 6. The critical residues of S protein involved in hydrogen bond formation in wild type RBD: ACE-2 docking was K417, G446, Y449, N487, Q493, G496, T500, and G502. Similarly, the key residues of ACE-2 protein were Q24, D30, E35, D38, Y41, Q42, Y83, and K353. Briefly, G502 (S) and Y505 (S) interacted with L353 (ACE-2); Y449 (S) with D39 (ACE-2) and Q42 (ACE-2); G446 (S) interacted with Q42 (ACE-2); T500 (S) bonded to Y41 (ACE-2), N487 (S) hydrogen-bonded with Y84 (ACE-2) and Q24 (ACE-2); lastly, L417 (S) and Q493 (S) interacted with D30 (ACE-2) and E35 (ACE-2) respectively. Eleven residues of the S protein and ACE-2 were involved in hydrophobic interactions (Figure 5a). Upon creating mutations within the RBD domain and docking with ACE-2, the same overall residues were involved in the interaction with threeexceptions (Figures 5b, 6a and 6b). In S477A and V483A RBD structure docked with ACE-2, additional hydrophobic interaction betweenL455 (S) with K31 (ACE-2) was observed. Another was in the case of the docked structure of N501Y-RBD: ACE-2 interaction, T500 of S protein failed to form H bond with Y41 of ACE-2 and showed only hydrophobic interaction. Moreover, Y at the 501th position formed a hydrophobic bond as of N with Y41 (ACE-2). Further, Q498 (S) generated an additional hydrophobic interaction with L45 (ACE-2) in the docked structure of N501Y-RBD: ACE-2.
Figure 5
Interacting amino acids between a) Wild Type-RBD: ACE-2; b) S477N-RBD: ACE-2. A template free targeted molecular docking between wild type RBD (PDB: 6M0J) and ACE-2, and RBD mutant with ACE-2 was performed using HDOCK server. The models with lowest RMSD values and highest docking scores were analysed for interactions using Ligplot + v.2.2 servers.
Figure 6
Interacting amino acids between a) V483A-RBD: ACE-2; b) N501Y-RBD: ACE-2. A template free targeted molecular docking between the mutated RBD (prepared from wild type RBD from PDB: 6M0J) with ACE-2 (retrieved from PDB: 6M0J) was performed using HDOCK server. The models with lowest RMSD values and highest docking scores were analysed for interactions using Ligplot + v.2.2 servers.
Interactimical">ng amimical">no acids between a) Wild Type-RBD: ACE-2; b) S477N-RBD: ACE-2. A template free targeted molecular docking between wild type RBD (PDB: 6M0J) and ACE-2, and RBD mutant with ACE-2 was performed using HDOCK server. The models with lowest RMSD values and highest docking scores were analysed for interactions using Ligplot + v.2.2 servers.
Interactimical">ng amimical">no acids between a) V483A-RBD: ACE-2; b) N501Y-RBD: ACE-2. A template free targeted molecular docking between the mutated RBD (prepared from wild type RBD from PDB: 6M0J) with ACE-2 (retrieved from PDB: 6M0J) was performed using HDOCK server. The models with lowest RMSD values and highest docking scores were analysed for interactions using Ligplot + v.2.2 servers.
Protein-protein docking of SARS-CoV-2 RBD containing RGD tripeptide sequence with integrin molecules
On performing targeted protein-protein docking of theSARS-CoV-2 RBD (6VSB) containing RGDtripeptide sequence with αIIbβ3, we obtained five different models involving the interaction of virus RGD with theintegrin molecule. D405 in the virus RBD bound through a hydrogen bond to R139 in Model 1 (Figure 7 a, d), Q451 in model 4 (Figure 7C) and R281 in model 7 (Figure 7e). Additionally, R403 interacted with N227 in model 3 (Figure 7B), D4 in model 4 (Figure 7C) and Y207 in model 5 (Figure 7d). Weak bonds further stabilized the overall interactions. Specifically, R403 showed hydrophobic interactions with R208 in model 1 (Figure 7A), P228 in model 3 (Figure 7B), P5 in model 4 (Figure 7c), R208 in model 5 (Figure 7D) and R276 and R279 in model 7 (Figure 7e). Similarly, D405 interacted through weak binding with R279 in model 7 (Figure 7e). The interaction of α5β1 with theSARS-CoV-2 RBD domain mainly involved weak interactions between theRGD of the virus and theintegrin molecule. The main hydrophobic interactions included R403 with E280 and T285 in model 1 (Figure 6 f, g). R403 also hydrogen bonded to E280 in model 2 (Figure 7g). Similar to α5β1 interaction with SARS-CoV-2, αvβ8 interactions with integrinRGD majorly included hydrophobic contacts. R403 interacted with L503 and P555 in model 8 (Figure 8c) and Y265 (Figure 8d) in model 1. D405 showed hydrophobic contacts with F559, N586, and K502 in model 5 (Figure 8A), Q590, I454, and S453 in model 7 (Figure 8b). Interestingly R408 of virus RBD interacted through a hydrogen bond with theintegrin molecule in model 1 (Figure 8B) and 7 (Figure 8d) in case of αvβ8 interaction with SARS-CoV-2 RBD. Interestingly, αvβ3 interacted with theSARS-CoV-2RGDtripeptide region through hydrogen and weak hydrophobic interactions. R403hydrogen bonded to C879 and D825 in model 9 (Figure 8e). Besides, D405 is hydrogen-bonded to R745 in model 10 (Figure 8f). The hydrophobic contacts included D405 with M826 and D825 and R403 with K881 in model 9 (Figure 8e). Interestingly, D405 of SARS-CoV-2 interacted via hydrogen bond with T328 in model 4 (Figure 8g) and through hydrophobic interactions with P584, N586, I587 in model 1 (Figure 8H), and R379 in model 2 (Figure 8i) of αvβ6. Moreover, R403established hydrogen bonds with S588 in model 1 (Figure 8h). The weak interactions of R403 were observed with E270, T328 in model 4 (Figure 8G), I587 in model 1 (Figure 8H), and S380 and R379 in model 2 (Figure 8i).
Figure 7
Molecular interactions between RBD of S-protein with integrins [αIIbβ3 (a–e), α5β1 (f–g)]. A template free targeted molecular docking between wild type RBD (PDB: 6VSB) with various integrins was performed using HDOCK server. The models showing possible interactions between RBD and integrins were analysed for interactions using Ligplot + v.2.2 servers.
Figure 8
Molecular interactions between RBD of S-protein with integrins [αvβ8 (a–d), αvβ3 (e–f), and αvβ6 (g–i)]. The template free and targeted molecular docking between wild type RBD (retrieved from PDB: 6VSB) with various integrins was performed using HDOCK server. The numerous models showing possible interactions between RBD and integrins were analysed for interactions using Ligplot + v.2.2 servers.
Molecular imical">nteractions between RBD of S-protein with integrins [αIIbβ3 (a–e), α5β1 (f–g)]. A template free targeted molecular docking between wild type RBD (PDB: 6VSB) with various integrins was performed using HDOCK server. The models showing possible interactions between RBD and integrins were analysed for interactions using Ligplot + v.2.2 servers.
Molecular imical">nteractions between RBD of S-protein with integrins [αvβ8 (a–d), αvβ3 (e–f), and αvβ6 (g–i)]. The template free and targeted molecular docking between wild type RBD (retrieved from PDB: 6VSB) with various integrins was performed using HDOCK server. The numerous models showing possible interactions between RBD and integrins were analysed for interactions using Ligplot + v.2.2 servers.
Screening of plausible host signaling influenced by the mutant SARS-CoV-2
Based on the findings of this study regarding the modification sites generated due to mutations, we screened the host's possible signaling pathways. Intriguingly, our results regarding modification sites of the viral E and M proteins enabled us to correlate with possible signaling pathways mediated for SARS-CoV-2 rapid internalization and hijacking of host cellular machinery. Some significant downstream signaling pathways affected by these mutations include; protein phosphorylation, cytoskeleton remodelling, and inflammatory responses (Figure 9).Possible sigmical">nalimical">ng pathways utilized by SARS-CoV-2 for cellular hijacking. a) SARS-CoV-2 may bind to theangiotensin-converting enzyme-2 (ACE-2) receptor and enter the cell through CME mediated pathway. b) Caveolae/raft dependent endocytosis: SARS-CoV-2 may function by attaching to the higher cholesterol regions of the cell membrane. It may facilitate invagination of the cell membrane, and viral complex. c) SARS-CoV-2 may also bind to theintegrinthrough theRGDtripeptide domain and activate the downstream signaling complexes. The activated complex may thus influence theRAS-related C3 botulinum toxin substrate-1 (RAC-1), Mitogen-activated protein kinase kinase (MEK), and interferon regulatory transcription factor (IRF3). d) SARS-CoV-2 may also bind to ACE-2 receptors and activate the downstream Gα subunit of ACE-2 receptors and transcription factor NF-κB leading to transcription of various inflammatory genes.
Briefly, the virus might utilize the clathrin-mediated endocytosis (CME) pathway (Figure 9a) for its entry into the cells and mediate the host intracellular responses through thePKC pathway [50]. Nonetheless, thenewly generated PKC sites on the virus E and M protein might get phosphorylated through thePKC pathway. It may facilitate the budding process of the virus as in the case of other viruses. The virus may also gain entry into the host cell using thePKC caveolae/raft mediated endocytosis pathway (Figure 9b). We also looked into the plausible functional aspects regulated by the binding of the virus's RGD to integrin molecules on the host surface. RGD is a conserved motif on SARS-CoV-2 S protein that has a potential to bind to theintegrin molecule, which may enhance the binding potential of the protein to ACE-2 [38,51,6,7,52]. Integrins can mediate signaling through the cell membrane by forming adhesome complex, which includes talin, paxillin, vinculin, focal adhesion kinase (FAK), Src, integrin-linked kinase (ILK), p130cas, and GTPase of the Rho family (Figure 9c) [53]. Upon integrin ligation, FAK may get activated. FAK, a tyrosine kinase, possesses an Src Homology 2 (SH2) domain binding site. FAK may bind to the phosphorylated site of integrin and recruit Src. Once theFAK-Src complex is activated, it may lead to the phosphorylation of downstream targets such as Rho-GTPases (RAC1, RhoA, and Cdc42), which primarily aid in cytoskeleton reorganization facilitating virus entry [54]. Additionally, FAK-Src complexes may integrate into theRas-Raf-MEK-ERK pathway, which regulate critical cellular functions such as cell proliferation and migration [55]. On the other hand, FAK-Src complexes also may phosphorylate downstream targets such as interferon regulatory transcription factor (IRF3), which upon entry into the nucleus activateNF-κB that could lead to Interferon alpha/beta (IFN α/β), Interleukin 2 (IL2) and IL10 secretions [56]. Viruses utilize their RGD motif to interact with theintegrin molecules and initiate these wide varieties of host responses [38, 48, 57]. Intriguingly, integrins are also involved in the innate immune response of the pathogen.Moreover, we obtained sites of myristoylation on the S and M protein. The virus with myristoylated S protein bound to theACE-2 receptor may activate the Gα, which further enables membrane protein phosphatidylinositol bisphosphate (PIP2) (Figure 9d) [22, 48, 57]. Furthermore, PIP2 may activate the phosphoinositide-3 kinase (PIP3) and DAG. Subsequently, the activated PIP3 may interact with the smooth endoplasmic reticulum (SER) and enable theCa2+ channel. The burst of Ca2+ ions may subsequently activate thecalmodulin. Additionally, activated calmodulin may signal calcineurin, which in turn results in recruitment of the transcription factor, i.e., nuclear factor of activated T cells (NFAT). NFAT may transcribe various genes that help in cell proliferation, differentiation, and inflammation [58]. The virus coils also potentially activate another signaling pathway through binding to the catalytic domain of Src. HCK upon recognition of myristoylated group may activate microtubule-associated protein kinase (MAPK) and PI3K. Furthermore, MAPK may phosphorylateERK and PI3K, which sequentially could activateprotein kinase B (Akt). Together ERK and Akt may recruit the P70/S6K complex to activate nuclear transcription factor GATA-binding factor 1 (GATA-1). GATA-1 is known to transcribe various genes involved in cell differentiation and lung cell respiratory burst [59]. On the other hand, binding of the virus to the host can stimulate lymphocyte-specific protein tyrosine kinase (LCK) directly through Gα. Activated LCK kinase may further activate the downstream molecules, namely ZAP-70, ITK, and Grb2 complex. Nonetheless, this complex may cause conversion of Ras-GDP to Ras-GTPthrough the son of sevenless (SOS), and thus C-Raf. C-Raf may induce MAPK, which may move to the nucleus and activate TF AP1. TF AP1 may help in the transcription of various genes that could help in cell differentiation, proliferation, and apoptosis [47].
Discussion
TheSARS-CoV-2 pandemic has brought thecoronaviridae family back in the limelight. Such a case of virus spread necessarily arises from mutations in the viral genome and functional proteins that aid in the adaptation of the virus to a new host [60]. Several efforts have been made to tackleSARS-CoV-2 infection and its spread worldwide. Mutations include the variations upon which natural selection acts, and often result in novelty [61]. The data reported for the age, and gender influenced by theinfection, along with the time of recovery of an individual from COVID-19 varied for different countries [52, 62, 63]. This depicted the virus's ability to beever-evolving making it a better fit to the host. However, the type and rate of mutations in the virus remain enigmatic. In the present study, we have correlated the impact of these mutations with host cell mechanistic, though further in-vitro and in-vivo studies need to be conducted.Protein modification is an important phenomenon in the world of the living and could closely be related to the disease pathogenesis. It often makes an organism more suited to an adverseenvironment [64]. Interestingly, the presence of such sites enables additional modifications in the proteins. Thehuman genomeencodes a wide variety of enzymes, namely; histoneacetyltransferase, N-myristoylase, and Casein II phosphorylase, which help in protein modification post-translation [64, 65]. Thus, the cells regulate processes like growth, replication, transcription, and translation through these modifications [22, 48]. Furthermore, pathogens like bacteria and viruses also possess such sites, which helps them better adapt to their host environment [66]. Some viruses even code for enzymes that help in virus modification [67, 68]. In our study, we identified several distinct mutations in theE, M, and S protein of SARS-CoV-2. We further speculated and investigated the repercussions of these mutations on the virus driven host processes. Consequently these insights would help to relate viral mutations and thereby generated modification sites, with subsequent SARS-CoV-2 disease pathogenesis. Notably, 9463 isolates of E and 6970 isolates of M proteins were compared to analyze the mutation sites.E protein, a minor component of the virus membrane, is involved in virus replication and infection processes [69]. All the isolates under investigation possessed a PKC phosphorylation site at 66–69 aa position. In addition E8L mutation in theE protein resulted in a PKC site at 6–8 aa position. In the case of several RNA viruses it is already reported that PKC phosphorylations are crucial in various steps of viral replication [70]. Further, M protein is more prevalent within the virus membrane, and is imperative for coronavirus budding and determining the shape of the viral envelope. During the assembly of the virus particle, M protein interacts with nucleocapsid, E, S, and M protein. Mutations occurring in the M protein could influence the host cell interaction. In the current study, we have observed several PKC sites in the M protein of SARS-CoV-2. Hui and Nayak reported that thePKC phosphorylation sites in viral membranes may change their dynamic ruffling and endocytosis [71]. For examplerespiratory syncytial virus (RSV) is known to inducePKC activation and its cytoplasmic translocation. Also theRSV particles are colocalized with PKCenzyme, required for the fusion of viruses to host cell membranes [25]. PKC helps viruses evade the phagolysosomal pathway through caveolae formation; further allowing viruses such as HIV type 1, filoviruses (Ebola), and simian virus 40evade the immune system [72]. HSV-1 recruits thePKC, where it phosphorylates lamin B to aid in the nuclear lamina modification and enhance budding at the nuclear membrane [73]. Availability of PKC phosphorylation sites in the viral proteins may facilitate the recruitment of PKC thus mediating the aforementioned processes. To further elaborate on thePKC site's importance, it is noteworthy to state the central role of phosphoprotein P in Borna disease virus (BDV) pathogenesis. A study has demonstrated this phenomenon by using a recombinant BDV with a mutated PKC phosphorylation site. PKC plays a pivotal role in adenovirus infection of corneal fibroblasts and regulation of downstream molecules, including the important lipid raft component caveolin-1 [23].Nevertheless, we identified myristoylation sites in the M protein. Also D614G mutation in the S protein resulted in a myristoylation site. Myristoylation involves attachment of a myristoyl group to the N-terminal of a glycine from myristic acid [74]. Myristoylation is a critical PTM that happens in a host cell and viruses which lack the transmembrane domain [74]. Viruses like the arena and HIV use this modification to promote the viral budding process [24]. Myristoylation of the virus protein may promote viral infectivity and replication [22, 75, 76]. Besides, Thomas Strecker et al., showed that inhibition of myristoylation prevented the lymphocytic choriomeningitis virus and Lassa viruses [77]. Also, a study found that myristoylation played a crucial role in poliovirus infectivity and processing of structural precursor protein [48]. Moreover, myristoylation of the hypothetical protein of African cassava virus suppressed the host RNA interference. Thus, a mutation on N-terminal glycine will affect virus binding and pathogenicity [78, 79]. Further, Johnson et al. demonstrated the importance of Ser/Thr residue at the fifth position of the consensus G1X2X3X4S/T5X6X7X8 apart from the presence of a glycine at the N-terminal end. This consensus sequence is present in almost all the myristoylated proteins [80].Moreover, the virus may enter the host cell through a receptor-mediated mechanism that requires the adherence of the virus with the host receptor. The attachment process is a crucial first step towards establishing a successful infection. The severity of COVID-19 and an enhanced transmission ability of SARS-CoV-2 is related to its increased potential to bind to host ACE-2 [81,82]. Moreover, the virus's cross-species migration involves the ability to engageACE-2, varying among the animal species [83]. We analyzed the binding of wild type and mutant SARS-CoV-2 with ACE-2. We identified certain exceptions in the binding of mutant SARS-CoV-2 S protein to ACE-2 compared to the wild type. Also, unlike previously reported studies [44], we observed two extra residues 473 and 475 of S proteins involved, while R393 of ACE-2 was missing from the interaction. However, the influence of these interactions on virus internalization and pathogenesis needs to beevaluated. Additionally, numerous anti-SARS-CoV-2 vaccines are currently under different phases of clinical trials. Most of the vaccines are designed targeting theS-protein to inhibit the viral entry and further transmission in the body. To check theeffect of distinct mutations against the vaccineefficacy is of crucial importance. The widespread mutation D614G lies at the inner side at the S1–S2 junction of S protein and don't lead to any change on epitopes or surface structure of S protein and hence may not get impacted by vaccines [84, 85, 86]. However, the mutations in the RBD region of S protein could affect theefficacy of vaccines. A study by Starr et al [86], mapped viral mutations that enabled the virus to escape the antibodies against SARS-CoV-2. However, the study utilized only two types of antibodies in a cocktail i.e., Regeneron's REGN-COV2 cocktail and Eli Lilly's LY-CoV016 antibody. Therefore the question still remains regarding theefficacy of vaccines under preparation against mutated viruses. Recently two new variants of SARS-CoV-2 have been identified to have greater transmission in the reported areas. The UK variant with 23 distinct mutations known as Variant of Concern, year 2020, month 12, variant 01 (VOC 202012/01) was reported in December 2020. The South Africa variant named 501Y.V2 was also reported shortly after the UK variant in December 2020. Both the variants carry N501Y mutation lying in the RBD region of S protein. Our study has demonstrated that this mutation has maintained the interaction with ACE-2 with involvement of an additional residue (Q498) of S protein. Currently various studies are under investigation to check efficacy of the vaccines under trials against variants carrying this mutation. Further, the crystal structure of S protein generated by Wrapp et al [87] depicted that D614 residue interacted with T859 of an adjacent chain when any one of the chains has RBD in up conformation. Similar observation has been reported in an earlier study as well [85]. Another recent report [88] has shown that D614 forms salt bridges with K854 of the fusion peptide proximal region (FPPR). The report suggested that theD614-K854 interaction supported the role of FPPR in membrane fusion. Our study denoted the interaction of D614 with T859 and K854, which further led us to concludeD614 interaction with both residues might be reinforcing the role of FPPR. The results also depicted theelimination of this intramolecular interaction on D614G mutation. However, relevant structure-based studies are required for further validation.In addition, Wrapp et al. demonstrated a significantly higher affinity of S protein of SARS-CoV-2 towards ACE-2 [89]. Studies using cryo-EM haveelucidated the interaction at the Angstrom level [63, 87]. Furthermore, a study documented the importance of theRGD sequence in SARS-CoV-2 transmission and pathology. We have validated the presence of the mutation in all of the isolates. TheRGD sequence is located at 403–405 neighbouring to theACE-2 interaction site. Integrin's, the multifunctional heterodimeric cell surface receptor molecules, serve as potent receptors utilizing theRGD thus facilitating theentry process of numerous viruses. Viruses like theadenoviruses and herpesviruses, exhibit an RGD (Arg-Gly-Asp) sequence on their virus surfaces [90, 91]. Notably, integrin's namely α5β1, α8β1, αvβ1, αvβ3, αvβ8, αvβ5, αvβ6, may recognize theRGD region on the host cell or theSARS-CoV-2 [92, 93, 94, 95]. We found that various members of theintegrin family react differently with theSARS-CoV-2 RBD domain-containing RGDtripeptide sequence. Integrin heterodimers such as αIIbβ3, αvβ3, αvβ6 interact with strong hydrogen bonds with R403 and D405 of SARS-CoV-2 RBD. While someintegrins like α5β1, α5β1 are weakly bonded mainly involving hydrophobic interactions with R403 and D405 of SARS-CoV-2. Therefore, theexpression of a particular type of integrin on the cell may contribute to strong or weak attachment of SARS-CoV-2 to its surface. Furthermore, integrin's favoured the cell to cell and cell to pathogen attachment and may benecessary to induce leukocyte migration aggravating the inflammatory reactions [96]. Thus, theRGD sequence of the S protein of SARS-CoV-2 may also be recognized and recruited by integrin's in alveolar epithelial cells to accelerate theinfection process [38, 51, 52, 53, 97]. Earlier studies have shown that blocking or deleting theRGD reduced the viral DNA uptake [98]. Thus, RGD interactions with theintegrin's seem to regulate not only the virus entry process but also the associated pathogenesis [38]. Therefore, identification of potential blockers that prevent the S protein binding to theACE-2 and integrin molecules can be performed. Targeting this interaction might provide possible therapeutic interventions for COVID-19 treatment. The receptor-ligand interaction can thereby alter a plethora of intracellular signaling pathways like PI-3K, FAK, Rho GTPases, Src, and diaphanous 2 (Dia2)-associated signaling, which is necessary for the internalization of the virus [55]. Nonetheless, phosphorylated integrin's can activate theFAK-Src complex, which further phosphorylates downstream targets such as Rho-GTPases (RAC1, RhoA, and Cdc42) that ultimately lead to cytoskeleton reorganization to facilitate virus entry inside the cell [54]. Additionally, integrin's have a distinctive property of shifting their ligand binding conformations from high to low, followed by signal transduction [33]. In some cases, the inactiveintegrin's can bind to the ligand. In other cases, theextracellular domain of the protein needs to undergo conformational changes brought in by the intracellular signaling modulations. Possibly, we hypothesize that the binding of the S protein to ACE-2 might provide this inside out signaling to theintegrin that might help stabilize the interaction. The sequence might further facilitateSARS-CoV-2 infection and be responsible for its increased ability to transmit from person to person. However, a detailed study needs to be conducted.
Conclusion
We hypothesize that this rapidly evolving RNA virus is capable of undergoing mutations that could impart the virus with an ability to manipulate the host and thereby establish itself successfully. We have obtained some significant differences in E, M, and S protein, which may contribute to an enhanced ability of SARS-CoV-2 to infect epithelial cells better than SARS-CoV. We observed aspartic acid (D) to glycine (G) mutation at 614th position of S protein of 84% of the isolates, forming a myristoylation site [22]. Also, we noted the presence of a conserved tripeptide sequenceRGD. Both the mutations may significantly contribute to viruses' enhanced potential to transmit among the population. Furthermore, the virus can regulate the critical intracellular signaling pathways likePKC signaling, myristoylation signaling and immune regulatory pathways, affecting the disease pathogenesis [22]. What needs to be seen is if these mutations contribute to enhanced pathogenicity or reduced capability of the virus to infect the host cells.
Declarations
Author contribution statement
Shweta Jakhmola, Omkar Imical">ndari, Dharmendra Kashyap: Conceived and designed theexperiments; Performed theexperiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Nidhi Varshney: Performed theexperiments; Contributed reagents, materials, analysis tools or data.Ayan Das: Performn class="Gene">ed theexperiments.
Elangovan Manivannan: Performed theexperiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.Hem Chamical">ndra Jha: Comical">ncn class="Gene">eived and designed theexperiments; Analyzed and interpreted the data; Wrote the paper.
Funding statement
This work was supported by the , India grant no 37 (1693)/17/EMR-II and , India (DST) as Ramanujan fellowship grant no. SB/S2/RJN-132/20/5. We are thankful to the Ministry of Human Resource, Department of Biotechnology, Council of Scientific & Industrial Research, DST-Inspire of Government of India for fellowship to Shweta Jakhmola, Omkar Indari, Dharmendra Kashyap, Nidhi Varshney respectively in the form of research stipend.
Data availability statement
Data included imical">n articln class="Gene">e/supp. material/referenced in article.
Competing interest statement
The authors dn class="Gene">eclare no conflict of interest.
Additional information
No additional information is available for this paper.
Authors: Mohammad A Yousuf; Ji Sun Lee; Xiaohong Zhou; Mirja Ramke; Jeong Yoon Lee; James Chodosh; Jaya Rajaiya Journal: Biochemistry Date: 2016-10-11 Impact factor: 3.162
Authors: Yan-Rong Guo; Qing-Dong Cao; Zhong-Si Hong; Yuan-Yang Tan; Shou-Deng Chen; Hong-Jun Jin; Kai-Sen Tan; De-Yun Wang; Yan Yan Journal: Mil Med Res Date: 2020-03-13
Authors: Rocío Girón-Navarro; Ivonne Linares-Hernández; Luis Antonio Castillo-Suárez Journal: Environ Sci Pollut Res Int Date: 2021-08-27 Impact factor: 4.223
Authors: Bakr Ahmed Taha; Qussay Al-Jubouri; Yousif Al Mashhadany; Mohd Saiful Dzulkefly Bin Zan; Ahmad Ashrif A Bakar; Mahmoud Muhanad Fadhel; Norhana Arsad Journal: Appl Microbiol Biotechnol Date: 2022-04-29 Impact factor: 5.560
Authors: Uttpal Anand; Shweta Jakhmola; Omkar Indari; Hem Chandra Jha; Zhe-Sheng Chen; Vijay Tripathi; José M Pérez de la Lastra Journal: Front Immunol Date: 2021-06-30 Impact factor: 7.561