Himanshu G Toor1, Devjani I Banerjee2, Soumya Lipsa Rath3, Siddhi A Darji4. 1. P.G. Department of Genetics, Ashok and Rita Patel Institute of Integrated Study and Research in Biotechnology and Allied Sciences (ARIBAS), Charutar Vidya Mandal University, P.O. Box No. 61, New Vallabh Vidyanagar, Vitthal Udyognagar, 388121, Anand, Gujarat, India; Sardar Patel University, Gujarat, India. Electronic address: himanshutoor@aribas.edu.in. 2. Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Faculty of Science, The Maharaja Sayajirao University of Baroda, Pratapganj, Vadodara, 390002, Gujarat, India. Electronic address: devjani.chak-cmb@msubaroda.ac.in. 3. National Institute of Technology, Warangal, Telangana, 506004, India. Electronic address: slrath@nitw.ac.in. 4. Dr. Vikram Sarabhai Institute of Cell and Molecular Biology, Faculty of Science, The Maharaja Sayajirao University of Baroda, Pratapganj, Vadodara, 390002, Gujarat, India. Electronic address: siddhiadarji@gmail.com.
Abstract
COVID-19 has intensified into a global pandemic with over a million deaths worldwide. Experimental research analyses have been implemented and executed with the sole rationale to counteract SARS-CoV-2, which has initiated potent therapeutic strategy development in coherence with computational biology validation focusing on the characterized viral drug targets signified by proteomic and genomic data. Spike glycoprotein is one of such potential drug target that promotes viral attachment to the host cellular membrane by binding to its receptor ACE-2 via its Receptor-Binding Domain (RBD). Multiple Sequence alignment and relative phylogenetic analysis revealed significant sequential disparities of SARS-CoV-2 as compared to previously encountered SARS-CoV and MERS-CoV strains. We implemented a drug re-purposing approach wherein the inhibitory efficacy of a cluster of thirty known drug candidates comprising of antivirals, antibiotics and phytochemicals (selection contingent on their present developmental status in underway clinical trials) was elucidated by subjecting them to molecular docking analyses against the spike protein RBD model (developed using homology modelling and validated using SAVES server 5.0) and the composite trimeric structures of spike glycoprotein of SARS-CoV-2. Our results indicated that Camostat, Favipiravir, Tenofovir, Raltegravir and Stavudine showed significant interactions with spike RBD of SARS-CoV-2. Proficient bioavailability coupled with no predicted in silico toxicity rendered them as prospective alternatives for designing and development of novel combinatorial therapy formulations for improving existing treatment regimes to combat COVID-19.
COVID-19 has intensified into a global pandemic with over a million deaths worldwide. Experimental research analyses have been implemented and executed with the sole rationale to counteract SARS-CoV-2, which has initiated potent therapeutic strategy development in coherence with computational biology validation focusing on the characterized viral drug targets signified by proteomic and genomic data. Spike glycoprotein is one of such potential drug target that promotes viral attachment to the host cellular membrane by binding to its receptor ACE-2 via its Receptor-Binding Domain (RBD). Multiple Sequence alignment and relative phylogenetic analysis revealed significant sequential disparities of SARS-CoV-2 as compared to previously encountered SARS-CoV and MERS-CoV strains. We implemented a drug re-purposing approach wherein the inhibitory efficacy of a cluster of thirty known drug candidates comprising of antivirals, antibiotics and phytochemicals (selection contingent on their present developmental status in underway clinical trials) was elucidated by subjecting them to molecular docking analyses against thespike protein RBDmodel (developed using homology modelling and validated using SAVES server 5.0) and the composite trimeric structures of spike glycoprotein of SARS-CoV-2. Our results indicated that Camostat, Favipiravir, Tenofovir, Raltegravir and Stavudine showed significant interactions with spikeRBD of SARS-CoV-2. Proficient bioavailability coupled with no predicted in silico toxicity rendered them as prospective alternatives for designing and development of novel combinatorial therapy formulations for improving existing treatment regimes to combat COVID-19.
SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2), a member of thebeta-coronavirus genus, which includes SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus), MERS-CoV (Middle East Respiratory Syndrome Coronavirus) and other coronaviruses, is a novel discovered virus that causes COVID-19 (A Chronicle on theSARS Epidemic, 2003). The occurrence of SARS-CoV-2 was first epidemiologically linked in December 2019 to patients suffering frompneumonia of an unknown cause who were linked to a seafood market in Wuhan, Hubei province, China. On January 30, 2020, WHO (World Health Organisation) declared SARS-CoV-2 as a “Public Health Emergency of International Concern” and officially named it as Corona Virus Disease 2019(COVID-19/2019-nCoV) (Wu et al., 2020; Li et al., 2020). As of July 27, 2020, Global statistics reported by WHO regarding this intensifying pandemic indicate 16,114,449 confirmed COVID-19 cases, including 6,46,641 deaths collectively in the 216 affected countries (https://www.who.int/emergencies/diseases/novel-coronavirus-2019).Pathological diagnoses of SARS-CoV-2 infections designate severeacute respiratory distress syndrome along with secondary ailments like sore throat, high fever, muscle weakness, encephalitis and diarrhoea in infectedpatients (Zhu et al., 2020). It primarily targets the lower respiratory system by invading the pulmonary epithelial cells, releasing the nucleocapsid component and arrogating the host's cellular machinery for cytoplasmic replication (Shah et al., 2020).Comprehensive structural genomics analyses exemplify that theSARS- CoV-2 genome codes for a variety of structural and non-structural proteins which constitute potential drug targets for scientific experimental investigations to impede theCOVID-19epidemic (Prajapat et al., 2020). Thespike glycoprotein (S protein) is a prospective drug target as it initiates the first crucial step in SARS-CoV-2 attachment to the host cell. It is a homotrimeric protein (each chain containing 1273 amino acids) which recognizes and binds to its receptor Angiotensin Converting Enzyme-2 (ACE-2) with a greater affinity attributed to themutations at L455, F486, Q493, S494, N501 and Y505 residues (Yan et al., 2020). Microbiological digital resources data insinuate a comprehensive schematic of thespike glycoprotein (Fig. 1, Fig. 2).
Fig. 1
ACE-2 receptor-mediated SARS-CoV-2 internalization occurs via three steps. The first step involves the recognition and attachment of the spike glycoprotein (SP, olive green) to its receptor ACE-2 (Red). This binding instigates the activation process wherein the spike protein is cleaved into its subunits, S1 (receptor-binding domain, light blue) and S2 (fusion peptide, yellow) via the adjacent transmembrane protease TMPRSS2 (light green). This fusion peptide integrates with the host cell membrane, resulting in subsequent release and internalization of the SARS-CoV-2 single-stranded RNA (ssRNA, blue) and other components. Once internalized, the virus utilizes the host cell machinery for replication and extrusion of novel viral particles (Colour, Double-column fitting image).
Fig. 2
Spike glycoprotein gene schematics. The spike glycoprotein gene comprises of eight distinct segments. Residues 1–12 act as the Signal peptide (SP, Red). Residues 13–685 and 686–1273 indicate the S1 and S2 subunits of the spike glycoprotein respectively. The S1 subunit comprises of the N-Terminal Domain (NTD, residues 13–303, Olive green) and the Receptor-Binding Domain (RBD, residues 319–541, yellow). The RBD domain is further sub-divided into the C-Terminal Domain (CTD, residues 334–427, Pink) and the Receptor-Binding Motif (RBM, 437–508, Light blue). The cleavage site 1 (685–686) releases the S2 subunit which comprises of the Fusion Peptide region (FP: 788–806, Brown) followed by a second cleavage site (815–816) which generates the S2′ Domain (816–1273). The S2 subunit comprises of two heptad repeats HR1 (920–927) and HR2 (1163–1202) (Green) and two Coiled-Coil Regions CCR1 (949–993) and CCR2 (1176–1203) (Cyan) respectively. A transmembrane Helical Domain exists (1214–1234, Bright yellow) followed by a Topological Cytoplasmic Domain (TPC, 1235–1273, Magenta) which encompasses a Short Sequence Motif (SSF, 1269–1273, light red). The Extracellular Topological domain spans from residues 13–1213. (Colour, Double-column fitting image).
Alternative strategies targeting SARS-CoV-2 spike glycoprotein
Several potential therapeutic strategies are being explored to combat SARS-CoV-2 targeting thespike glycoprotein as depicted in Fig. 3 (Wu et al., 2020). Of these strategies, ‘Drugs Re-purposing’ has emerged as a strong pharmaceutical rationale due to its reduced perils of drug development costs and shortening of the time gap between identification of prospective drug candidates and designing of treatment regimes for patients. This is due to the availability of extensive data regarding its pharmacokinetics, pharmacodynamics, toxicity, clinical trial results, etc (Zhou et al., 2020; Ciliberto, 2020).
Fig. 3
Alternative therapeutic strategies to counteract SARS-CoV-2 targeting the spike Glycoprotein. (Black and White, Single-column fitting image).
Drug-spike protein molecular docking studies wereexecuted to elucidate their inhibitory competence to counteract SARS-CoV-2 infections targeting spike glycoprotein receptor-binding domain. Computational pharmacokinetic analyses coupled with in silico toxicity prediction studies were used to evaluate the therapeutic proficiency of the selected drug candidates for futuristic COVID-19 treatment regimes.
Materials and methods
Workstation
Theentire study was executed on a Dell Laptop having an i5-6200U Intel core (2.30 GHz) processor, 8 GB RAM, 1 TB hard disk and a 64-bit operating system. All softwares utilized for the present study were open-source tools and freely downloadable.
Sequence retrieval
The protein sequences, structures and other relevant data regarding the surfacespike glycoprotein fromSARS-CoV-2 and the whole genome sequence of SARS-CoV-2 Wuhan-Hu-1 isolate were downloaded from digital resources such as NCBI (National Centre for Biotechnology Information) (https://www.ncbi.nlm.nih.gov/), Universal Protein Knowledgebase (UniProtKB) and RCSB-PDB (Research Collaboratory for Structural Bioinformatics-Protein Data Bank) databases (Sayers et al., 2018; Apweiler et al., 2004; Berman et al., 2000). For comparative analysis, SARS-CoV (first identified in Foshan, Guangdong, China in 2002 from bats) data was also collected and analyzed (Chinese Law and Government, 2003; Li et al., 2005) (Table 1
).
Table 1
Spike Glycoprotein sequence information.
Sr. No.
Digital resource
IDa
Description
1
UniProtKB
P0DTC2
SARS-CoV-2b spike glycoprotein
2
NCBI
YP_009724390.1
SARS-CoV-2 surface glycoprotein
3
NCBI
MN908947.3
Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
The crystallization propensity of theSARS-CoV-2spike glycoprotein sequence (UniProtKB ID: P0DTC2) was predicted by the PPCpred, CRYSTALP2 and XTalPred servers respectively (Mizianty and Kurgan, 2011; Kurgan et al., 2009; Slabinski et al., 2007). XTalPred also disclosed diverse physicochemical information such as Molecular Weight (MW), hypothetical isoelectric point (pI), amino-acid length, instability index and Grand Average of Hydropathicity (GRAVY) index, etc. The prediction of the conformational diversity in the ordered globular domains and intrinsically disordered regions of thespike glycoprotein sequence was derived using IUPred2A, MobiDB web tool, Meta disorder web server and PONDR servers (Meszaros et al., 2018; Piovesanet al., 2018; Kozlowski and Bujnicki, 2012; Romero et al., 1997). The unstructured regions of proteins are stipulated to play a crucial role in diverse cellular activities.
Physicochemical characterization and post-Translational modifications (PTMs)
The computation of the physicochemical properties and amino acid scale representation to determine the significant amino acid enrichment and/or depletion patterns were resolved by ProtParam and ProtScale tools of ExPasy (a SIB Bioinformatics resource portal) and Protein Calculator v 3.4 servers. (Gasteiger et al., 2005; Anthis and Clore, 2013).SignalP 4.1 server was used to identify the secretory signal peptides and its cleavage sites (Nielson, 2017). N-Terminal acetylation sites were predicted by NetAcet 1.0 server (Kiemer et al., 2005). High specificity Coronavirus 3C-like proteinase cleavage sites were predicted by theNetCorona-1.0 server (Kiemer et al., 2004).N-linked and O-linked glycosylation sites were predicted using theNetNGlyc-1.0 and NetOGlyc-4.0 servers respectively (Gupta and Brunak, 2002; Steentoft et al., 2013). NetPhos 3.1 server was used to predict theserine, threonine or tyrosine phosphorylation sites (Blomet al., 1999). DiANNA 1.1 webserver was used to predict thedisulfide bond connectivity (Ferre and Clote, 2005). The surface accessibility, secondary structure, disorder, and phi/psi dihedral angles of amino acids were predicted by NetSurfP-2.0 server (Klausenet al., 2019). DeepLoc 1.0 server was used to predict the sub-cellular localization of thespike glycoprotein by deepneural network algorithms (Armenteros et al., 2017).
Secondary structure prediction, tertiary structure prediction, refinement and model evaluation
PSIPRED 4.0 and JPred v.4 servers were utilized to elucidate the secondary structure characteristics of thespike glycoprotein (McGuffin et al., 2000; Drozdetskiy et al., 2015). The UniProtKB spike glycoprotein sequence (P0DTC2) was submitted in FASTA format to Modeller v9.24 which generated its 3D structureemploying a restraint-based approach using a model-singlemodule (Webb and Sali, 2017; Sali and Blundell, 1993). The best model was selected based on low discrete optimized protein energy (DOPE) score. It was then submitted to RAMPAGE server to determine the geometrical structural validity (Lovell et al., 2003). Loop refinement of the outlier residues was carried out by ModLoop server (Fiser et al., 2000).The above steps were repeated until the outlier residues were within the designated values. The final model geometry was subjected to energy minimization by Chiron server wherein the atomic steric clashes between the side-chain and backbone were resolved (Ramachandran et al., 2011). The overall quality assessment of the final model in terms of packing density (total void volume), unsatisfied hydrogen bonds, steric clashes and the accessible surface area scaling was exemplified by Gaia server (Kota et al., 2011). The stereochemical quality evaluation of the pre- and post-refinement 3D models of spike glycoprotein was done via PROCHECK, ERRAT (v 2.0) and Verify3D environment profile analysis methods by submitting to SAVES v5.0 (Structural Analysis and Verification Server) (Pradeepkiran et al., 2015).The Root Mean Square Deviation (RMSD) statistics between the target model and template were calculated by structural superimposition in SuperPose v1.0 program which utilizes a modified quaternion eigenvalue approach under the default parameters (Maiti et al., 2004). This final model was selected as the receptor for further studies. Composition profiler was utilized to investigate the amino acid composition bias in themodeled spike glycoprotein and the template protein (Vacic et al., 2007).
Multiple Sequence Alignment (MSA) and comparative phylogeny
MSA of the sequence of the generated 3-D model with theexisting PDB structures was carried out using Jalview v2.11.1.0 (Waterhouseet al., 2009). Evolutionary analyses of the selected sequences was carried out by phylogenetic tree construction using MEGA X v10.1.8 maintaining the default values of the selected parameters (Kumar et al., 2018).
Binding sites prediction
Binding site prediction to determine the active site residues of the 3-D spike glycoprotein model was carried out by COACH server (Yang et al., 2013; Yang et al., 2013). These were also determined by the receptor cavity method using Biovia Discovery studio v17.2.0.16349 (Dassault Systemes, 2017). The functional domain prediction was carried out by MOTIF search, a subserver of GenomeNet, using the Pfam and PROSITE data (Al-Khayyat and Al-Dabbagh, 2016). The cut-off score (E-value) was set at 1.0.
Ligand selection and molecular docking
The computational molecular docking of ACE-2 (PDB ID: 1R42) (receptor) with thespike glycoprotein model (ligand) was performed using the HawkDock server. It is an integrated web server that utilizes ATTRACT for protein-protein docking, HawkRank scoring function for re-ranking docked complexes based on desolvation potentials and Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) freeenergy decomposition methodology for the identification of key residues in Protein-Protein Interactions [PPIs] (Weng et al., 2019; Sarkar et al., 2020).The drug candidates for virtual screening against thespike glycoprotein model were selected from an extensive literature survey of potential therapeutic agents undergoing trials against SARS-CoV-2 and/or those which were previously developed as antiviral/anti-malarial therapeutic inhibitors as shown in Table 1. The. sdf files of all molecules were retrieved from the PubChem database and were converted to. pdb files and saved for further docking protocols (Kimet al., 2015).Molecular docking of the selected ligands with thespike glycoprotein 3-D model (receptor) was carried out by iGEMDOCK v2.1 (Hsu et al., 2011; Kashyap, 2019). It is an integrated virtual screening graphical automatic drug design system dealing with preparations through post-screening analysis depicting crucial pharmacological interactions based on protein-compound interaction profiles. It employs a generic evolutionary method algorithm. The docking accuracy settings (GA parameters) for all docking operations were set as follows: Population size: 300; Generations: 80; Number of solutions: 10. Themode of docking was stable (slow docking).The scoring function was set to GEMDOCK and the hydrophobic and electrostatic preferences for the ligands were set to 1.00. Themolecular interactions of the ligand-receptor docked complexes were analyzed using Discovery studio v20.1.0.19295, PyMol v1.7.4.5 and UCSF Chimera v1.11.2 respectively (Seeliger and de Groot, 2010; Sadati et al., 2019; Pettersenet al., 2004). The best docked complex of each ligand-receptor combination was selected based on the least binding energy values for further analysis.
Galaxy refine complex
Refinement of protein-protein complex structure was done by GalxyWEB. This tool allows us to understand the flexibility, at the protein interface (modeled RBD with RBD-ligand complex) and gives us an idea regarding overall conformational changes that occurs upon binding (Ko et al., 2012). GalaxyRefine refines the docked structure by a hybrid technique that involves the triaxial loop closure (TLC) algorithm during global optimization by conformational space annealing (CSA) technique (Park and Seok, 2012). Themolecular interactions of the ligand-receptor docked complexes were analyzed using PyMol v1.7.4 (Seeliger and de Groot, 2010). Further, Favipiravir, Hydroxychloroquine and Nafamostat were taken to study the possible structural changes occurring to spike protein when drug binds.
Argus Lab analysis
Molecular docking interactions of the selected ligands with theSARS-CoV-2 whole trimeric spike glycoprotein (PDB ID's 6VSB and 6VYB respectively) were analyzed and visualized using Argus Lab 4.0.1 (Thompson, 2004). Argus Lab platform has distinct advantages over other softwares because it performs meticulous docking analyses of huge peptides/proteins with a precise chain or domain selection within minutes. It follows the Lamarckian genetic algorithm along with precise grid resolution calculation in coherence with the ligand binding sites.It was performed under the following parametric considerations: scoring function selected as ‘AScore’, docking algorithm as Argus dock (exhaustive search), a grid resolution of 0.4 A0 and the binding site box dimensions measuring 60 A0 x 60 A0 x 60 A0 respectively. The docking precision was set to “Regular precision” and “Flexible” ligand docking mode was utilized for all docking runs. The individual docked pose stability was evaluated by ArgusLab energy calculations and the number of hydrogen bonds formed (Hafeez et al., 2013).The best docking model was selected according to the lowest AScore computed and themost pertinent binding conformation was selected on the basis of ligand-receptor hydrogen bond interactions in the vicinity of the substrate binding site. The lowest energy poses implied the highest binding affinity because high energy values represent unstable conformations.
In silico toxicity prediction
The analysis of the pharmacokinetic properties of the selected ligands such as absorption, distribution metabolism and excretion (ADME), drug-likeness and medicinal chemistry features was carried out by SWISS ADME (Daina et al., 2014, 2017; Daina and Zoete, 2016). OSIRIS Property Explorer was used to evaluate the risks of sideeffects, such as mutagenicity, tumorigenicity, irritant and reproductive consequences, as well as drug-relevant properties including cLogP, LogS (solubility), molecular weight, Topological Polar Surface Area (TPSA), drug-likeness and overall drug-score (Ayati et al., 2012).
Results
Crystallization propensity and protein- disorderness
PPCPred and CRYSTALP2 servers both predicted the sequence to be non-crystallizable. PPCPred predicted the crystallization propensity to be 0.084 which was significantly lower than the probability threshold value of 0.43. The individual step values of the crystallization process were as follows: probability that production of protein material fails: 0.831, probability that purification fails: 0.806, probability that crystallization fails: 0.01 and probability that target will yield diffraction-quality crystals: 0.355. CRYSTALP2 predicted the sequence as non-crystallizable with a confidence of 0.28. Xtalpred results depicted that the protein was non-crystallizable (EP crystallization class 5: very difficult; RF protein crystallization class 4). It also enumerated an assortment of molecular characterization parameters (Table 2
). Overall, thespike glycoprotein RBDmodel was deemed non-crystallizable in its native form.
Table 2
Molecular characterization of P0DTC2 sequence predicted by XTalPred server.
Sr.No.
Server
Molecular feature (s)
Prediction
1
XtalPred
Molecular weight
141.196 KDa
Amino acid length
1273
Theoretical pI
6.24
Instability index
33.02
GRAVY
−0.08
Molecular surface feature (EP and RF crystallization):
a) Surface entropy
−1.06
b) Surface hydrophobicity
−0.09
c) Surface ruggedness
4.04
2
COILS
Coiled-coils regions
28
3
DISOPRED 2
Longest disorder region
16 (1%)
4
TMHMM
Transmembrane helices
1
5
SignalP
Signal peptides
No
6
SEG
Longest low-complexity region
23
7
PSIPRED
Secondary structure prediction:
Coil residues%
38
Helix residues %
22
Strand residues %
40
8
–
Residue content
Cysteine residues
40
Methionine residues
14
Tryptophan residues
12
Tyrosine residues
54
Phenylalanine residues
77
pI: Isoelectric point; GRAVY: Grand Average of Hydropathicity; EP: Expert Pool; RF: Random Forest.
Molecular characterization of P0DTC2 sequence predicted by XTalPred server.pI: Isoelectric point; GRAVY: Grand Average of Hydropathicity; EP: Expert Pool; RF: Random Forest.TheMobiDB web tool indicated 0% disorderness in thespike glycoprotein sequence (UniProtKB: P0DTC2) while IUPred2A and Metadisorder web server (global disorder tendency = 0.3474) indicated disordered residues. PONDR server depicted 11 disordered regions, comprising a total of 98 disordered amino acid residues (7.70%) with the longest disorder region comprising of 38 amino residues (Average prediction score = 0.1374). DEPP (Disorder Enhanced Phosphorylation Predictor) in the PONDR server predicted the phosphorylation potentiality of serine (6 out of 99; 6.0606%), threonine (2 out of 97; 2.06%) and tyrosine (3 out of 54; 5.56%) residues of thespike glycoprotein. (Table S1; Fig. S1 in supplementary data).
Physicochemical characterization and post-translational modifications (PTMs)
SARS-CoV-2encompasses a variety of structural proteins such as thespike glycoprotein, small envelope protein (E), Matrix protein (M) and the nucleocapsid protein (N) while theORF1a/b region (initial ~20 kb) is translated into two polyproteins (PP1A and PPIB) which encode themajority of theNon-Structural Proteins (NSPs) (Khodadadi et al., 2020, Wu et al., 2020).ProtParam and ProtScale servers explicated several physicochemical parameters of SARS-CoV-2spike glycoprotein viz. Molecular weight: 141.178Kda; theoretical pI: 6.24; Instability index: 33.01 (stable); aliphatic index: 84.67; GRAVY: −0.079. Theestimated half-life (t1/2) was predicted to be 30 h in mammalian reticulocytes (in vitro); >20 h in yeast (in vivo) and >10 h in E. coli (in vivo) (Kar et al., 2020). ProtScale also depicted an anthology of hydrophobicity plots of thespike glycoprotein. Protein calculator v3.4 further corroborated the above results. SignalP 4.1 server predicted residues 1–13 as the signal peptide (cleavage site between the 13th and 14th residues) with a score (D value) of 0.837 which was higher than the threshold cut-off value (D = 0.450).NetAcet 1.0 server predicted no N-terminal acetylation as alanine, glycine, serine or threonine residues were not present at positions 1–3. One specific coronavirus 3C-like proteinase cleavage site was found by NetCorona 1.0 server at position 1002 with a score of 0.685 (TGRLQ^SLQTY). N-linked glycosylation sites predicted by NetNGlyc 1.0 server were in accordance with the data retrieved from UniProtKB for thespike glycoprotein while O-linked glycosylation sites were predicted at positions 673, 678 and 686 by NetOGlyc server 4.0 which was in accordance with the previous research analysis (Andersenet al., 2020).NetPhos 3.1 server predicted phosphorylation of serine, tyrosine and threonine residues at 136 positions in thespike glycoprotein. DiANNA 1.1 web server revealed 20 disulfide bond connectivity predictions. NetSurfP 2.0 results validated the disorder predictions and also depicted surface accessibility and phi/psi angles for each amino acid. DeepLoc 1.0 server predicted that thespike glycoprotein was a soluble (score = 0.0577), membrane type protein (score = 0.9423).PSIPRED 4.0 and JPred v.4 servers deduced the secondary structure of thespike glycoprotein which comprised of α-helices, β-sheets and coils. It also predicted a putative domain boundary at position 164 (Asparagine) (Fig. S2 in supplementary data).Thespike glycoprotein sequence from UniProtKB was submitted to MODELLER v9.24. It generated two models whose characteristics are listed in Table 3
. GA341 (range: 0–1, valuenear 1 being preferred), z-DOPE (lowest score preferred), MPQS (ModPipe Quality Score; highest value preferred) and sequence identity (highest value preferred) scores wereemployed for model quality assessment.
Table 3
Model characteristics generated by MODELLER v.9.24.
Sr.No.
Features
Model 1
Model 2
1
Target region
14–302
330–529
2
Protein length
1273
1273
3
PDB template code
5 × 4S A
6YLA E
4
Template region
18–289
330–529
5
Sequence identity
54%
100%
6
E-value
0
0
7
GA341
1
1
8
MPQS
0.786723
1.37941
9
z-DOPE
−0.3
−1.32
10
TSVMod method
MSALL
MSALL
11
TSVMod RMSD
3.711
0.464
12
TSVMod NO35
0.923
0.972
E-value: no: of expected hits of similar quality; MPQS: ModPipe Quality Score; z-DOPE: Discrete Optimized Protein Energy Score of Salilab Model evaluation server; TSVMod RMSD: Root Mean Square Deviations; TSVModNO35: Native Overlap prediction of fraction of Cα atoms within 3.5 Å of the native structure.
Model characteristics generated by MODELLER v.9.24.E-value: no: of expected hits of similar quality; MPQS: ModPipe Quality Score; z-DOPE: Discrete Optimized Protein Energy Score of Salilab Model evaluation server; TSVMod RMSD: Root Mean Square Deviations; TSVModNO35: Native Overlap prediction of fraction of Cα atoms within 3.5 Å of the native structure.The results indicated that model 2 was themost reliable one as compared to model 1. TheMPQS score of model 1 was also predicted to be unreliable as compared to that of model 2. Model 2 represented a segment of the S1 subunit of theSARS-CoV-2spike glycoprotein as deduced from the UniProtKB data whilemodel 1 corresponded to theN-Terminal Domain (NTD) region. It depicted 2 N-Acetyl glycosylation sites at asparagine residues (positions 331 and 343 respectively). Themodel (residues 1–200) formed themajority portion of the Receptor Binding Domain (RBD: residues 319–541) containing the Receptor Binding Motif (RBM; residues 437–508).The selected model (Fig. 4
a
) was analyzed by Ramachandran plot generation by submission to the RAMPAGE server which elucidated 186 favored residues (93.9%), 10 allowed residues (5.1%) and 2 outlier residues (TRP353 and PHE497; 1%) respectively. Themodel was subjected to further structural refinement by ModLoop server which then depicted 186 favored residues (93.9%), 12 allowed residues (6.1%) and zero outlier residues respectively (Fig. 4b). The refined model was then subjected to energy minimization by Chiron server with side chain constraints.
Fig. 4
The refined spike RBD model 3D structure comprising of helix (red), sheet (yellow) and loop (green) regions. b) The Ramachandran plot of the spike RBD model exhibiting zero outlier residues. (Colour, Double-column fitting image).
ACE-2 receptor-mediated SARS-CoV-2 internalization occurs via three steps. The first step involves the recognition and attachment of thespike glycoprotein (SP, olive green) to its receptor ACE-2 (Red). This binding instigates the activation process wherein thespike protein is cleaved into its subunits, S1 (receptor-binding domain, light blue) and S2 (fusion peptide, yellow) via the adjacent transmembrane proteaseTMPRSS2 (light green). This fusion peptide integrates with the host cell membrane, resulting in subsequent release and internalization of theSARS-CoV-2 single-stranded RNA (ssRNA, blue) and other components. Once internalized, the virus utilizes the host cell machinery for replication and extrusion of novel viral particles (Colour, Double-column fitting image).Spike glycoprotein gene schematics. Thespike glycoprotein gene comprises of eight distinct segments. Residues 1–12 act as the Signal peptide (SP, Red). Residues 13–685 and 686–1273 indicate the S1 and S2 subunits of thespike glycoprotein respectively. The S1 subunit comprises of theN-Terminal Domain (NTD, residues 13–303, Olive green) and the Receptor-Binding Domain (RBD, residues 319–541, yellow). TheRBD domain is further sub-divided into the C-Terminal Domain (CTD, residues 334–427, Pink) and the Receptor-Binding Motif (RBM, 437–508, Light blue). The cleavage site 1 (685–686) releases the S2 subunit which comprises of the Fusion Peptide region (FP: 788–806, Brown) followed by a second cleavage site (815–816) which generates the S2′ Domain (816–1273). The S2 subunit comprises of two heptad repeats HR1 (920–927) and HR2 (1163–1202) (Green) and two Coiled-Coil Regions CCR1 (949–993) and CCR2 (1176–1203) (Cyan) respectively. A transmembraneHelical Domain exists (1214–1234, Bright yellow) followed by a Topological Cytoplasmic Domain (TPC, 1235–1273, Magenta) which encompasses a Short SequenceMotif (SSF, 1269–1273, light red). TheExtracellular Topological domain spans from residues 13–1213. (Colour, Double-column fitting image).Alternative therapeutic strategies to counteract SARS-CoV-2 targeting thespike Glycoprotein. (Black and White, Single-column fitting image).The refined spikeRBDmodel 3D structure comprising of helix (red), sheet (yellow) and loop (green) regions. b) The Ramachandran plot of thespikeRBDmodel exhibiting zero outlier residues. (Colour, Double-column fitting image).After energy minimization, themodel was subjected to structural validation by Gaia server (with side chain constraints) which elucidated the following data: Clash score = 0.0381; 69 unsatisfied HBS (Hydrogen Bonds in Shell); 4 HBC (Hydrogen Bonds in Core); solvent accessible (10225.4 Å2) and solvent excluded (8623.33 Å2) surface areas and a void volume score of 0.53. All bond lengths were within the acceptable range. The side-chain integrity discrepancies were refined and themodel was then submitted to theSAVES V5.0 server for structure validation.SAVES server validated the refined model yielding an overall quality factor of 83.3333. VERIFY3D substantiated that about 80% amino acids scored≥0.2 in the 3D/1D profile. PROVE results depicted about 25 buried outlier protein atoms (3.6%). Overall results validated the refined model (spikeRBDmodel). Discovery studio construed characteristics of thespikeRBDmodel viz. 1947 atoms; Molecular formula: C1016 H1522 N265 O295 S8; molecular weight: 22.411Kda and a net formal charge of +5.ThespikeRBDmodel and the template (6YLA chain E) were superimposed via Superpose 1.0 server which depicted a resulting local and global RMSD value of 0.37 over 200 Cα atoms and a value of 0.43 over 800 atoms in the protein backbone (Fig. 5
a).
Fig. 5
a) Superimposition of spike RBD model (red) with its template 6YLA chain E (green). b) Superimposition of the spike RBD model (yellow) with 6VSB (red). c) Superimposition of the spike RBD model (yellow) with 6VYB (yellow). (Colour, Double-column fitting image).
a) Superimposition of spikeRBDmodel (red) with its template 6YLA chain E (green). b) Superimposition of thespikeRBDmodel (yellow) with 6VSB (red). c) Superimposition of thespikeRBDmodel (yellow) with 6VYB (yellow). (Colour, Double-column fitting image).Composition profiler revealed that no significant enrichment or depletion patterns in the amino acid composition had occurred. The relativeentropy had a value of 0.00092 with a p-value of 1. Thespikemodel was also superposed with theSARS-CoV-2spike open-stateectodomain structure (PDB ID: 6VYB) and the Prefusion 2019-nCoVspike glycoprotein with a single receptor-binding domain up structure (PDB ID: 6VSB). ThespikeRBDmodel-6VYB superimposition illustrated a local and global RMSD value of 1.183 over the Cα atoms and atoms in the protein backbone whereas thespikeRBDmodel-6VSB superimposition elucidated a local and global RMSD values of 1.655 and 1.648 over the Cα atoms and atoms in the protein backbone respectively (Fig. 5b and c). TheMOTIF server detected diverse proteomic segments in the wholespike glycoprotein and thespikeRBDmodel (Table 4
).
Table 4
Functional domain prediction by MOTIF search server.
Functional domain prediction by MOTIF search server.Pfam = Protein family.IE-value = Independent E-value.Sequential analysis of thespikeRBDmodel and the template (6YLA chain E) revealed that our RBDmodel lacked the first three residues (GLU327, THR328 and GLY329) incorporated by 6YLA chain E. In addition, the residues 1, 2, 3, 4 …. in thespikeRBDmodel corresponded to residues 330, 331, 332, 333… and so on. Hence, the numbering of residues in thespikeRBDmodel could be correlated to the residue position in 6YLA chain E as follows:Where R.N = ResidueNumber.
Multiple Sequence Alignment (MSA), comparative phylogeny and binding sites prediction
SARS-CoV-2 is a more potent virus as it depicts important genomic dissimilarities when compared to theSARS-CoV (79%) and MERS-CoV (50%) genomes (Walls et al., 2020). Previous studies elucidated that amino acid mutations in theSpike glycoprotein influenced its credited ACE-2 (receptor) binding capability in humans (Walls et al., 2020; Coutard et al., 2020).MSA of the selected sequences listed in Table 1 including thespikeRBDmodel was carried out using Jalview v2.11.1.0 (Fig. 6
). It revealed that large number of significant amino acid substitutions/mutations had occurred in theSARS-CoV-2 as compared to theSARS-CoVspike protein sequences reported earlier viz. R403K, E406D, K417V, L441I, S443A, K444T, V445S, G446T, L452K and K501N respectively.
Fig. 6
MSA (Multiple Sequence Alignment) of the refined spike model with the assorted spike glycoprotein sequences listed in Table 1. The Receptor-Binding Domain (RBD, 319–541) consisted of the Receptor-Binding Motif (RBM, 437–508). 15 RBD residues were implicated in molecular interactions with ACE-2 (red and green). The cysteine residues (dark blue) were involved in disulphide bond interactions. MSA revealed mutation of amino acid residues of SARS-CoV-2 viz. V417K, T446G, P475A, W490F, N493Q, and T501N respectively (green). The other possible mutations within the strains are reported by white within the strains. (Colour, Double-column fitting image).
MSA (Multiple Sequence Alignment) of the refined spikemodel with the assorted spike glycoprotein sequences listed in Table 1. The Receptor-Binding Domain (RBD, 319–541) consisted of the Receptor-Binding Motif (RBM, 437–508). 15 RBD residues were implicated in molecular interactions with ACE-2 (red and green). Thecysteine residues (dark blue) were involved in disulphide bond interactions. MSA revealed mutation of amino acid residues of SARS-CoV-2 viz. V417K, T446G, P475A, W490F, N493Q, and T501N respectively (green). The other possiblemutations within the strains are reported by white within the strains. (Colour, Double-column fitting image).Furthermore, SARS-CoV-2spike glycoprotein asserted 80% sequence identity to the previously validated spike protein structures. These residues formed a part of loops 1e, 1 d, 2 g and 3f respectively. Loop 2 g accounted for 50% of thesemutations. Thesemutations accentuate the augmented binding affinity of thespike protein to its receptor ACE-2 and were in good co-relation with earlier investigations (Ortega et al., 2020).An evolutionary analysis of theSpikeRBDmodel with the selected sequences in Table 1 was achieved by phylogenetic tree construction utilizing theNeighbour-Joining method (Saitou and Nei, 1987) (Fig. 7
). Theevolutionary distances were computed using the Poisson correction method and are in the units of the number of amino acid substitutions per site (Zuckerkandl and Pauling, 1965).
Fig. 7
Phylogenetic tree analysis using MEGA X software. The optimal tree with the sum of branch length = 0.35413322 is shown. Data coverage is shown in percentage (%) while the nodal branch lengths are depicted in decimals. (Black and White, Single-column fitting image).
Phylogenetic tree analysis using MEGA X software. The optimal tree with the sum of branch length = 0.35413322 is shown. Data coverage is shown in percentage (%) while the nodal branch lengths are depicted in decimals. (Black and White, Single-column fitting image).COACH and TM-SITE results predicted the binding site residues of theSpikeRBDmodel viz. ASN354, LYS356, SER399, VAL401 and ARG509 with C-score (confidence of prediction) values of 0.10 and 0.22 respectively. Using Discovery studio visualizer, we found 6 possible receptor-binding cavities in theSpikeRBDmodel (Table S3 in supplementary data).
ACE-2-spike RBD model molecular docking and analysis
ThespikeRBDmodel (ligand) and ACE-2 (receptor) were docked using the HawkDock server which generated 10 ligand-receptor docked complexes. The best one was selected based on theMM/GBSA score (binding freeenergy in Kcal/mol) having a value of −52.33 (Fig. 8
). A total of 20 molecular interactions were observed upon analysis using the aforementioned molecular analysis software, of which 15 interactions were considered significant (bond-length of ≤4.0 Å) (Table 5
).
Fig. 8
Docked complex of spike RBD model (ligand, red) with ACE-2 (receptor, blue). Residues GLY446, GLY496 and ASN501 (RBD) formed a single hydrogen bond with residues ASN259, ASN136 and GLU132 of ACE-2 (bond lengths: 2.9, 2.2 and 2.7 respectively). TYR449 depicted 2 hydrogen bonds with residues ASN259 (bond length 2.8) and THR258 (bond length 2.7) respectively. The yellow-dashed lines indicate hydrogen bonds. (Colour, Double-column fitting image).
Table 5
ACE2-spike RBD model molecular interactions analysis.
Docked complex of spikeRBDmodel (ligand, red) with ACE-2 (receptor, blue). Residues GLY446, GLY496 and ASN501 (RBD) formed a singlehydrogen bond with residues ASN259, ASN136 and GLU132 of ACE-2 (bond lengths: 2.9, 2.2 and 2.7 respectively). TYR449 depicted 2 hydrogen bonds with residues ASN259 (bond length 2.8) and THR258 (bond length 2.7) respectively. The yellow-dashed lines indicatehydrogen bonds. (Colour, Double-column fitting image).ACE2-spikeRBDmodel molecular interactions analysis.* = Significant; H-Donor = Hydrogen Donor; H-Acceptor = Hydrogen Acceptor.Earlier studies expounded that thespikeRBD comprised of significant loops (a-f) which were then re-categorized into loop 1, 2 and 3 sub-segments. It was observed that thespike protein interactions with ACE-2 receptor comprised of loops 3a, 3 b and 3f respectively (Robson, 2020). It also contained ninecysteine residues of which eight are involved in disulfide bond formation viz. CYS336-CYS361, CYS379-CYS432, CYS391-CYS525 and CYS480-CYS488 respectively. Of these, the first threedisulphide bonds facilitate structural stabilization while the last one acts as a loop connector in the distal RBM region (Lan et al., 2020). From the results obtained, it was observed that the loop regions of thespikeRBDmodel (2 g, 3 b and 3f respectively) displayed maximum significant interactions.A total of thirty ligands were selected from an extensive literature survey, WHO (www.who.int/covid-19/information) and FDA (www.fda.gov/drugs/emergency-preparedness-drugs/) organizations’ list of approved drugs designated for the treatment of SARS-CoV-2 to evaluate their therapeutic inhibitory potential against thespike Glycoprotein. These ligands were docked against SpikeRBDmodel (receptor) utilizing iGEMDOCK v2.1 and their molecular interactions were analyzed (Table 6
).
Table 6
Molecular docking analysis of he selected ligands with the spike glycoprotein RBD model.
Synthetic tetracycline derived protein synthesis inhibitor
−77.46
−77.46
15.31
7
4
Yes
30.
Ivermectin
6321424
Antiparasitic
Macrocyclic lactone
−102.63
−102.63
11.98
10
5
Yes
PubChem CID = PubChem Chemical Identifier; b VDW = Van der Waal's interactions; c TI = Total Interactions; d SI = Significant Interactions (Bold); e LI = Loop Interactions; * RT = Reverse Transcriptase, Red colour indicates interaction with spike glycoprotein RBD-ACE-2 receptor residues.
Molecular docking analysis of he selected ligands with thespike glycoprotein RBDmodel.PubChem CID = PubChem Chemical Identifier; b VDW = Van der Waal's interactions; c TI = Total Interactions; d SI = Significant Interactions (Bold); e LI = Loop Interactions; * RT = Reverse Transcriptase, Red colour indicates interaction with spike glycoprotein RBD-ACE-2 receptor residues.The criteria for selection of the potential drug candidates as potential inhibitors of SARS-CoV-2spike glycoprotein RBDmodel hinged on various factors viz. number and strength of thehydrogen bonds, number and strength of other interactions (electrostatic/hydrophobic/van der Waals), number of loop interactions (2 g, 3 b and 3f respectively), number of mutated residues involved, and number of non-significant interactions involving loop residues, etc. Based on the above factors, we selected ten drugs out of the thirty studied for further analysis whose detailed significant molecular interactions areentailed (Table 7
). All selected drug candidates are illustrated in Fig. 9
.
Table 7
Detailed analysis of selected drugs with spike glycoprotein RBD model involving significant interactions.
Sr.No.
Drug
Significant spike RBD model interacting residues
Bond Type
Bond Length (in Å)
Interacting residues of Actual spike Glycoprotein
PreciseLoop Interactions
Contact residues of the SARS-CoV-2 RBD–ACE2
Cysteine residue interactions
RBD (319–541)
RBM (437–508)
1
Tenofovir
W436
Conventional Hydrogen bond
2.62
Yes
No
No
No
No
F342
Carbon Hydrogen bond
3.30
Yes
No
No
No
No
W436
Electrostatic (pi-anion)
3.48
Yes
No
No
No
No
L441*
Hydrophobic (pi-sigma)
3.58
Yes
Yes
2 g
No
No
W436
Hydrophobic (pi-sigma)
3.80
Yes
No
No
No
No
2
Eugenol
E471
Carbon Hydrogen bond
3.30
Yes
Yes
3a
No
No
E471
Carbon Hydrogen bond
3.29
Yes
Yes
3a
No
No
D467
Electrostatic (pi-anion)
3.74
Yes
Yes
No
No
No
K458
Hydrophobic (pi-sigma)
3.81
Yes
Yes
2 g
No
No
3
Allicin
E471
Electrostatic interaction
3.78
Yes
Yes
3a
No
No
K458
Hydrogen bond
2.70
Yes
Yes
2 g
No
No
S469
Hydrogen bond
3.08
Yes
Yes
No
No
No
P491
Hydrophobic interaction
3.98
Yes
Yes
3 b
No
No
4
Stavudine
Y453
Conventional Hydrogen bond
2.35
Yes
Yes
2 g
Yes
No
N501*
Conventional Hydrogen bond
3.05
Yes
Yes
3f
Yes
No
R403*
Carbon Hydrogen bond
3.30
Yes
Yes
1e
No
No
Y495
Carbon Hydrogen bond
3.23
Yes
Yes
3f
No
No
S494
Carbon Hydrogen bond
3.30
Yes
Yes
3 b
No
No
5
Raltegravir
S349
Conventional Hydrogen bond
2.59
Yes
No
No
No
No
S399
Conventional Hydrogen bond
2.98
Yes
No
No
No
No
S349
Conventional Hydrogen bond
2.96
Yes
No
No
No
No
F347
Conventional Hydrogen bond
2.90
Yes
No
No
No
No
L441*
Halogen (Fluorine)
3.65
Yes
Yes
2 g
Yes
No
N354
Pi-Donor Hydrogen bond
2.44
Yes
No
No
No
No
A348
Hydrophobic (alkyl)
3.54
Yes
No
No
No
No
6
Zalcitabine
R403*
Carbon Hydrogen bond
3.30
Yes
No
1e
No
No
7
Favipiravir
R403*
Conventional Hydrogen bond
3.05
Yes
No
1e
No
No
Y505
Conventional Hydrogen bond
3.30
Yes
Yes
3f
Yes
No
R403*
Carbon Hydrogen bond
3.40
Yes
No
1e
No
No
8
Camostat
L441*
Conventional Hydrogen bond
2.91
Yes
Yes
2 g
No
No
N354
Carbon Hydrogen bond
3.30
Yes
No
No
No
No
9
Doxycycline
Y449
Conventional Hydrogen bond
1.87
Yes
Yes
2 g
Yes
No
N501*
Conventional Hydrogen bond
2.35
Yes
Yes
3f
Yes
No
S494
Conventional Hydrogen bond
3.22
Yes
Yes
3 b
No
No
Y505
Hydrophobic (pi-sigma)
3.63
Yes
Yes
3f
Yes
No
10
Ivermectin
G381
Conventional Hydrogen bond
2.32
Yes
No
No
No
No
F377
Carbon Hydrogen bond
2.99
Yes
No
No
No
No
Y380
Carbon Hydrogen bond
3.74
Yes
No
No
No
No
C379
Sulfur-X
2.99
Yes
No
No
No
Yes
Y369
Pi-Donor Hydrogen bond
3.28
Yes
No
No
No
No
RBD = Receptor Binding Domain; RBM = Receptor Binding Motif; ACE-2 = Angiotensin Converting Enzyme type II; * = Possible mutated amino acids in SARS-CoV-2.
Fig. 9
Structures of all ten selected drugs colored by CPK. Hydrogen atoms (white), Oxygen atoms (red), Chlorine atoms (green), Nitrogen atoms (blue) and Phosphorous atoms (orange). (Colour, Double-column fitting image).
Detailed analysis of selected drugs with spike glycoprotein RBDmodel involving significant interactions.RBD = Receptor Binding Domain; RBM = Receptor Binding Motif; ACE-2 = Angiotensin Converting Enzyme type II; * = Possiblemutated amino acids in SARS-CoV-2.Structures of all ten selected drugs colored by CPK. Hydrogen atoms (white), Oxygen atoms (red), Chlorine atoms (green), Nitrogen atoms (blue) and Phosphorous atoms (orange). (Colour, Double-column fitting image).Most significant interactions were observed in case of Stavudine, Doxycycline and Favipiravir (Fig. 10
a, b and 10c). Our results also indicated that Tenofovir, Eugenol, Allicin, Raltegravir, Zalcitabine, Camostat and Ivermectin depicted significant molecular interactions with theACE-2 residues involved in spike glycoprotein recognition elucidating their inhibitory potentiality. Moreover, Raltegravir also interacted with loop 1e (LYS441) but via a halogen (fluorine) interaction. (Fig. S3a/b - S9a/b in supplementary data).
Fig. 10
The docked complexes of Stavudine (10a), Doxycycline (10 b) and Favipiravir (10c) with the spike RBD model are depicted in Fig. i), iii) and v) respectively while ii), iv) and vi) represent the 2D interaction diagrams of Stavudine-, Doxycycline- and Favipiravir–spike RBD model docked complexes comprising of van der Waals (medium green), Conventional Hydrogen bonds (green), Carbon–Hydrogen bonds (light green), Pi-Pi T-Shaped (dark pink), Pi-Alkyl (light pink), Pi-Sigma (purple), Amide-Pi stacked (magenta) and Unfavourable Donor-Donor (red) interactions. (Colour, Double-column fitting image).
The docked complexes of Stavudine (10a), Doxycycline (10 b) and Favipiravir (10c) with thespikeRBDmodel are depicted in Fig. i), iii) and v) respectively while ii), iv) and vi) represent the 2D interaction diagrams of Stavudine-, Doxycycline- and Favipiravir–spikeRBDmodel docked complexes comprising of van der Waals (medium green), Conventional Hydrogen bonds (green), Carbon–Hydrogen bonds (light green), Pi-Pi T-Shaped (dark pink), Pi-Alkyl (light pink), Pi-Sigma (purple), Amide-Pi stacked (magenta) and Unfavourable Donor-Donor (red) interactions. (Colour, Double-column fitting image).Remdesivir, Liquiritin, Nafamostat, Oseltamivir, Zanamavir, Lopinavir, EmblicaninA and Ivermectin depicted strong interactions with CYS336 and CYS379 residues of theSpikeRBDmodel, thus hypothesizing their role in structural destabilization of theSpike protein (Table S2; Fig. 6 supplementary data).In Fig. 11
we can see the residues at the interface of RBD of thespike protein and theACE-2 receptor as listed earlier (Table 5). To understand whether the binding of inhibitor causes any structural changes in theRBD-ACE2 complex, we superimposed the structure of theRBD with and without the inhibitors. Favipiravir was found to interact directly with residues Y505, R403 and L441 of theRBD, where R403 is considered to be a possiblemutation (Fig. 12
a, Fig. 6, Table 7). Surprisingly, upon superimposition of RBD and theRBD-Favipiravir complex we found deviation in side chain conformation of R403 (Fig. 12a). Apart from that few changes were seen in V401 and R509. This may envisage that Favipiravir alters thespike binding efficiency with ACE-2 receptor. Further, we also did a similar study with Hydroxycholoroquine and Nafamostat, both of which did not directly bind near the interface region of theSpike and ACE2 proteins. As it can be seen from Fig. 12
b, amino acids L517, T430, F429, F464, E516 and D428 were directly interacting with thehydroxychloroquine. The large number of non-polar residues promotes primarily hydrophobic interactions with the ligand. After superimposition we could observe the switch in the polar side chain of D428 of thespike protein. Although, we could not seemuch difference at the interface region of Spike-ACE2 complex, we could again find a prominent side chain shift of R403 towards RBD, in the presence of inhibitor, similar to favipiravir. Nafamostat showed interactions with S371, S373 A363, C336, F368, G339, S373, S371, V367, L335 and D 364 residues and not much difference was seen in the absence of the ligand (Fig. 12
c). C336 forms disulphide linkage with C361 and plays a crucial role in structureal stabilization of RBD. However, R403 shift remained consistent in all the three complexes. This indicates that inhibitor binding impacts the binding energetics of theRBD to theACE2 receptor.
Fig. 11
The protein-protein complex of RBD of Spike protein (magenta) and the ACE-2 receptor (yellow) are shown as cartoon. A large number of polar residues from the Spike protein interact with the ACE2 receptor by hydrogen bonding and hydrophobic interactions (Table 5). The primary residues from the Spike protein which interact with the ACE2 are shown as sticks and colored by CPK. (Colour, Double-column fitting image).
Fig. 12a
Superposed structures of Spike protein with (green) and without Favipiravir (white). The residues and the inhibitor are shown as sticks and colored by CPK. The carbon atoms of Favipiravir are shown in yellow. (Colour, Double-column fitting image).
Fig. 12b
Superposed structures of Spike protein with (green) and without Hydroxychloroquine (white). The residues and the inhibitor are shown as sticks and colored by CPK. The carbon atoms of Hydroxychloroquine are shown in yellow. (Colour, Double-column fitting image).
Fig. 12c
Superposed structures of Spike protein with (green) and without Nafamostat (white). The residues and the inhibitor are shown as sticks and colored by CPK. The carbon atoms of Camostat are shown in yellow. (Colour, Double-column fitting image).
The protein-protein complex of RBD of Spike protein (magenta) and theACE-2 receptor (yellow) are shown as cartoon. A large number of polar residues from theSpike protein interact with theACE2 receptor by hydrogen bonding and hydrophobic interactions (Table 5). The primary residues from theSpike protein which interact with theACE2 are shown as sticks and colored by CPK. (Colour, Double-column fitting image).Superposed structures of Spike protein with (green) and without Favipiravir (white). The residues and the inhibitor are shown as sticks and colored by CPK. Thecarbon atoms of Favipiravir are shown in yellow. (Colour, Double-column fitting image).Superposed structures of Spike protein with (green) and without Hydroxychloroquine (white). The residues and the inhibitor are shown as sticks and colored by CPK. Thecarbon atoms of Hydroxychloroquine are shown in yellow. (Colour, Double-column fitting image).Superposed structures of Spike protein with (green) and without Nafamostat (white). The residues and the inhibitor are shown as sticks and colored by CPK. Thecarbon atoms of Camostat are shown in yellow. (Colour, Double-column fitting image).Validation of the drug-Spikemolecular interactions were achieved by docking studies of the selected drugs with theSpike glycoprotein trimeric structures available (PDB ID's 6VSB and 6VYB respectively) (Tables S4–S5 in supplementary data). We observed that Eugenol, Allicin, Doxycycline and Ivermectin did not depict any significant interactions with 6VSB. Furthermore, Tenofovir, Zalcitabine and Favipiravir depicted four significant interactions each with theExtracellular Topological Domain (ETD) of the S1 subunit whileCamostatexhibited five significant interactions with the S2 subunit ETD. Likewise, Allicin, Stavudine, Raltegravir and Ivermectinelucidated no interactions with 6VYB. However, Camostat and Zalcitabineexhibited 1 and 4 interactions respectively with the Coiled-Coil Region 1 (CCR-1) of the S2 subunit. Eugenol and Doxycyclineexhibited 1 interaction each with theETD of S2′ domain and S1 subunit respectively (Figs. S10–S11 in supplementary data).SWISS-ADME determined the in silico physico-chemistry and pharmacokinetic analysis of the selected drugs. It was perceived that Ivermectin had the highest Po/w coefficient (iLOG Po/w = 5.86) whileFavipiravir had the lowest value (iLOG Po/w = 0.39) signifying its higher solubility. Furthermore, Allicinexhibited the lowest skin permeation rate (log K = −9.89 cm/s) whereas Eugenol depicted the highest (log K = −5.69 cm/s). All drugs, except Ivermectin, complied with the Lipinski rules accompanied by minor violations. The GI absorption rate was high for all drugs except Tenofovir, Raltegravir, Doxycycline and Ivermectin. Out of all drugs, Eugenol and Allicin were predicted as Blood-Brain Barrier (BBB) permeants. Ultimately, all drugs had an acceptable bioavailability score, except for Doxycycline and Ivermectin (Tables S6–S7 in supplementary data).OSIRIS Property Explorer was utilized to compute the systemic toxicity of the selected drugs. Our analysis showed that Raltegravir displayed the highest drug-likeness capacity (score = 0.999) whileTenofovir had the least possible value (0.0). Moreover, Eugenol and exemplified a high risk of mutagenicity, tumorigenicity and irritant capacity (scores of 0.6 in each) whileZalcitabine and Doxycyclineexhibited a high risk of reproductivetoxicity (score of 0.6). Favipiravir had the best drug score (0.933) amongst all potential drug candidates (Tables S8–S9 in supplementary data).
Discussion
Coronaviruses (CoV) are positive single-stranded RNA viruses classified as pathogenic agents reported since the 1960's. These viruses are responsible for causing acute and chronic respiratory diseases as well as enteric, hepatic and neurologic infections as it has a broad host range (avian, murine, porcine, bovine and other domestic mammalian species including humans) (Weiss and Martin, 2005; Di Gennaro et al., 2020).SARS-CoV-2 has materialized as a potent etiological agent with a current infection rate of ~20,000 individuals/day. Till date, no confirmed drug/vaccine candidates have been reported to counteract COVID-19epidemic rendering healthcare facilities to face comprehensive obstacles to restore thehealthiness of COVID-19 affected individuals. This is predominantly accredited to the genomic discrepancy of SARS-CoV-2 as compared to previously established SARS-CoV and MERS-CoV strains, making SARS-CoV-2more virulent than its predecessors. These anomalies have commanded stringent amendments in conventional treatment regimes in terms of efficacy for eradication of SARS-CoV-2 (Fig. 6).Diverseexperimental research analyses have yielded abundant information regarding SARS-CoV-2 life cycle, epidemiology, etc (Wu et al., 2020; Li et al., 2020). Molecular experimentation in coherence with computational biology has resulted in the fabrication of credible therapeutic strategies for combating COVID-19 viz. identification of proteomic and genomic drug targets, traditional epidemiological drug re-purposing, peptidomimetics studies for antibody/vaccine development against structural and non-structural proteins of SARS-CoV-2 targeting host attachment and viral replication processes, etc. (Alanagreh et al., 2020) (Fig. 1, Fig. 2).The present study focused on the drug re-purposing based inhibition of SARS-CoV-2 attachment wherein thespike glycoprotein of SARS-CoV-2 undergoes conformational reorganization to distinguish its receptor ACE-2 and initiates a cascade of molecular progressions resulting in the integration with the cell membrane and liberation of its genome inside the host cell for replication.We developed our validated model of thespike glycoprotein RBD based on the UniProtKB sequence P0DTC2 utilizing 6YLA chain E as the template (Fig. 5a). Multiple sequence alignment studies of the constructed spikeRBDmodel with previously constructed 3-D structures demonstrated that SARS-CoV-2exhibited significant mutations as compared to SARS-CoV and MERS-CoV strains resulting in an amplified affinity for ACE-2 (Fig. 6). The protein was predicted to be non-crystallizable and depicted minutedisorderness. Thespike glycoprotein RBD harboured many such mutations and exhibited additional sequential diversity which designates it as a latent therapeutic drug target (Table 2, Table 3, Table 4; Fig. 10a/10b/10c).We implemented a computational biology–based approach for drug re-purposing to screen drug candidates as potential therapeutic inhibitors via molecular docking analyses impacting thespike glycoprotein mechanism of action (Tables 6–7).Molecular docking analysis involving thespike glycoprotein RBDmodel (ligand) with ACE-2 (receptor) depicted twenty interactions of which fifteen were deemed significant and implicated loop 2 g, 3 b and 3f residues. Considering the above results in conjunction with the number and type of significant bonds formed, loop interactions, mutated amino acid residues concerned, etc., we observed that tenmolecules out of the assorted set of thirty drug candidates exposited molecular interactions with the loop residues of thespike glycoprotein (Table 6, Table 7). Additionally, eight drugs exhibited significant interactions with thecysteine residues of thespikeRBDmodel constituted in disulfide bond formation for structural stabilization (Table S2 in supplementary data).Eugenol, an allyl-chain substituted guaiacol, is derived fromessential oils such as clove oil, etc. It retains diverse pharmacological roles, including antiviral potentiality. It has been reported to have a detrimental effect on the viral envelope of newly formed virions as observed in studies with Herpes Simplex viruses HSV-1 and HSV-2 (Pramod et al., 2010). Similarly, Allicin (allyl 2-propenethiosulfinate) is a bioactive compound present in garlicextract and is reported to display antiviral activity against Influenza viruses A and B, HIV, herpes simplex viruses, rotavirus, etc. (Bayan et al., 2014). Eugenolexhibited interactions with the loop residues of spike glycoprotein but was deemed to be a BBB permeant, along with high risks of mutagenicity and tumorigenicity, questioning its administration for treatment of COVID-19 (Fig. S4; Table S6).Stavudine, a thymidine nucleoside analogue, is a reverse transcriptase (RT) inhibitor and has been selectively utilized against HIV strains. Stavudine is intracellularly phosphorylated by cellular kinases to Stavudine triphosphate which competes with deoxythymidine triphosphate, the natural substrate of RT. It also inhibits DNA polymerases Beta and Gamma (Hurwitz and Schinazi, 2012). Favipiravir, a purinenucleoside analogue (6-fluoro-3-hydroxy-2-pyrazinecarboxamide), is a potent RNA-dependent RNA polymerase inhibitor where its phosphorylated form (T-705-RTP) exhibits broad-spectrum antiviral potency against retroviruses viz. arenavirus, bunyavirus, flavivirus, etc. (Du and Chen, 2020). Both depicted significant interactions with theACE-2 binding regions of spike glycoprotein with no inherent risk of predicted systemic toxicity (Tables S6- S7).Zalcitabine, another thymidine analogue, acts in thesame way as Stavudine and is more potent (Leandro et al., 2013). It depicted two interactions with ARG403 (loop 1e) of thespike protein with a high risk of hazardous reproductive consequences. Tenofovir, an acyclic adenosine nucleotide analogue, is used in combinatorial therapy with other anti-retroviral drugs to treat HIV and singularly to treat Hepatitis B infections, acts as a reverse transcriptase inhibitor and depicted no predicted systemic toxicity upon administration (Delaney et al., 2006) (Fig. S7; Table S7).Raltegravir is an anti-retroviral drug belonging to the class of integrase inhibitors, ultimately affecting viral DNA insertion and its subsequent integration into the host cell's DNA. It is primarily used to treat HIV-1 infections (Boesecke and Gelgor, 2009). It exhibited seven significant interactions of which only one targeted the loop residue of spike protein (LEU441) (Fig. S7). Similarly, Camostat (a carbonyl compound) is a synthetic serine protease inhibitor targeting the host cell TMPRSS2 (viral entry mediator) thereby inhibiting viral infection and replication (Uno, 2020). It also exhibits anti-inflammatory and anti-fibriotic capacity. LikeRaltegravir, it interacted with LEU441 but overall had only two significant bonds. Both drugs lacked predicted toxicity upon administration (Fig. S8).Doxycycline (a semi-synthetic derivative of oxytetracycline) is a protein synthesis inhibitor and exhibits broad spectrum anti-bacterial potency (Holmes and Charles, 2009). It also exhibits antiviral competence by inhibition of matrix metalloproteases in case of Dengue and other retroviruses or transcriptional up-regulation of intracellular zinc finger antiviral protein (ZAP) (Malek et al., 2020). Although it illustrated three significant spikeRBD loop interactions, it was predicted to have a low bioavailability and posed a high risk of reproductivetoxicity (Fig. 10b; Table S9). Ivermectin (an avermectin derivative), a broad-spectrum anti-parasitic agent is a potent endectocide and has been reported to inhibit theHIV-1 integrase and host cell Importin (responsible for nuclear import of viral proteins) subsequently hindering viral replication and also exhibits antiviral potency against other retroviruses (Caly et al., 2020). Ivermectin did not interact with any ACE-2 binding regions of Spike protein. However, it did depict a significant interaction with CYS379 which is involved in disulfide bond formation (CYS379-CYS432) and depicted no systemic toxicity whatsoever. Docking of the selected drug candidates with thespike glycoprotein trimeric structures elucidated distinctive subsidiary spike protein target regions in addition to theRBD, contriving the segmented drug targets within thespike protein for future drug development experiments (Fig. S9).Several studies corroborate the usage of the drugs selected in the present study to counteract COVID-19. Remdesivir has reached the phase III level of clinical trials for the treatment of mild and moderateSARS-CoV-2 infections (Senanayake, 2020). Favipiravir, an antiviral/anti-influenza drug has been shortlisted in ten hospitals across India for a phase III trial with mild and moderately infectedCOVID-19patients while an another combination of Doxycycline and Ivermectin has been recommended to treat acute symptoms of COVID-19 in Bangladesh which has reached to theexperimental stage in India (Hafeez et al., 2020).Another example of combinatorial therapy combines Emtricitabine/Tenofovir-Alafenamide and Lopinavir/Ritonavir to treat COVID-19patients (Duan et al., 2020). Camostat and Nefamostat and their derivatives are certified protease inhibitors which are currently being investigated in Japan as therapeutic alternatives to combat SARS-Co-V 2 (Hoffman et al., 2020). No significant interaction was observed betweenRBD and hydroxychloroquine thus concluding that themode of action of hydroxychloroquine is different than that of direct interaction with spike protein (Table 6;Fig. 12b). Chloroquine inhibits the viral attachment to its receptor along with the subsequent molecular processes involved in the final viral particleextrusion (Narkhedeet al., 2020; Devaux et al., 2020). Nafamostat, primarily used for anticoagulant therapy, exhibited spike-mediated membrane fusion inhibitory potentiality (Li and Clercq, 2020). Theseexperimental studies advocate drug-repurposing as a viable and resourceful technique to eradicate the infectivity of SARS-CoV-2 because of its cost-effectiveness and reduction of administration complexity in treatment regimes with high compatibility efficacy.
Conclusion
The present work comprised of selection and molecular docking analysis of a collection of 30 plausible drug candidates having diverse in silico mechanisms with thespike glycoprotein of SARS-CoV-2 aimed at identifying novel inhibitory competency of thesame. Our results suggested that ten drugs out of the thirty selected could be utilized as promising drugs as they interacted with theexperimentally validated ACE-2 binding residues and also depicted additional interactions with theETD regions. Camostat, Favipiravir, Tenofovir, Raltegravir and Stavudine illustrated maximum interactions with thespike glycoprotein RBDmodel and the inherent trimeric structure displaying optimal bioavailability (score = 0.55) with absence of predicted systemic toxicity. As these drugs can be a good candidate for further in vitro or in vivo studies so dosage appropriate amalgamation of these drugs/drug derivatives in conjunction with refined experimental validation can serve as the platform for combinatorial drug therapy design and development to counter COVID-19 for futuristic applications.
Support
The authors have credited the sources of support received in the Acknowledgement page.
Permissions
The authors have duly credited all the sources which were utilized during themanuscript preparation.
Sources of funding
The present work has received no specific grant from any funding agency in the public, commercial or not-for-profit sectors in India or elsewhere.
Ethical approval
This article does not contain any studies with humanparticipants or animals performed by any of the authors.
CRediT authorship contribution statement
Himanshu G. Toor: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review & editing, Final manuscript approval. Devjani I. Banerjee: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing, Final manuscript approval. Soumya Lipsa Rath: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing - review & editing, Final manuscript approval. Siddhi A. Darji: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review & editing, Final manuscript approval.
Declaration of competing interest
The authors declare that they have no conflict(s) of interest.
Authors: Qazi Mohammad Sajid Jamal; Varish Ahmad; Ali H Alharbi; Mohammad Azam Ansari; Mohammad A Alzohairy; Ahmad Almatroudi; Saad Alghamdi; Mohammad N Alomary; Sami AlYahya; Nashwa Talaat Shesha; Suriya Rehman Journal: Saudi J Biol Sci Date: 2021-04-28 Impact factor: 4.219
Authors: Ahmed Abdelaal Ahmed Mahmoud M Alkhatip; Michail Georgakis; Lucio R Montero Valenzuela; Mohamed Hamza; Ehab Farag; Jaqui Hodgkinson; Hisham Hosny; Ahmed M Kamal; Mohamed Wagih; Amr Naguib; Hany Yassin; Haytham Algameel; Mohamed Elayashy; Mohamed Abdelhaq; Mohamed I Younis; Hassan Mohamed; Mohammed Abdulshafi; Mohamed A Elramely Journal: Int J Mol Sci Date: 2021-03-15 Impact factor: 5.923