Literature DB >> 32987103

Design of novel viral attachment inhibitors of the spike glycoprotein (S) of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) through virtual screening and dynamics.

Arafat Rahman Oany¹, Mamun Mia², Tahmina Pervin³, Md Junaid⁴, S M Zahid Hosen⁵, Mohammad Ali Moni⁶.

Abstract

To date, the global COVID-19 pandemic has been associated with 11.8 million cases and over 545481 deaths. In this study, we have employed virtual screening approaches and selected 415 lead-like compounds from 103 million chemical substances, based on the existing drugs, from PubChem databases as potential candidates for the S protein-mediated viral attachment inhibition. Thereafter, based on drug-likeness and Lipinski's rules, 44 lead-like compounds were docked within the active side pocket of the viral-host attachment site of the S protein. Corresponding ligand properties and absorption, distribution, metabolism, excretion, and toxicity (ADMET) profile were measured. Furthermore, four novel inhibitors were designed and assessed computationally for efficacy. Comparative analysis showed the screened compounds in this study maintain better results than the proposed mother compounds, VE607 and SSAA09E2. The four designed novel lead compounds possessed more fascinating output without deviating from any of Lipinski's rules. They also showed higher bioavailability and the drug-likeness score was 0.56 and 1.81 compared with VE607 and SSAA09E2, respectively. All the screened compounds and novel compounds showed promising ADMET properties. Among them, the compound AMTM-02 was the best candidate, with a docking score of -7.5 kcal/mol. Furthermore, the binding study was verified by molecular dynamics simulation over 100 ns by assessing the stability of the complex. The proposed screened compounds and the novel compounds may give some breakthroughs for the development of a therapeutic drug to treat SARS-CoV-2 proficiently in vitro and in vivo.

Entities: Chemical

Keywords: ADMET; Drug designing; Molecular docking; Molecular dynamics; Virtual screening

Mesh：

Substances：

Year: 2020 PMID： 32987103 PMCID： PMC7518233 DOI： 10.1016/j.ijantimicag.2020.106177

Source DB: PubMed Journal: Int J Antimicrob Agents ISSN： 0924-8579 Impact factor: 5.283

Introduction

Over the last two decades, major outbreaks of coronaviruses (CoVs) associated with mild to severe human respiratory tract infections have been a bitter global health experience. The first global outbreak was caused by severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002 and covered five continents with over 8000 infected cases and a mortality rate of 10% [1], [2], [3], [4], [5], [6], [7]. During 2012, the second outbreak occurred in Saudi Arabia caused by middle east respiratory syndrome coronavirus (MERS-CoV), with a high mortality rate of 35% [8, 9]. The third human pathogenic coronavirus, 2019 novel coronavirus (2019-nCoV), was first identified at the end of 2019 in Wuhan, China with atypical pneumonia [10] and subsequently spread to the whole world within 3-4 months. The 2019-nCoV has exceeded SARS-CoV and MERS-CoV in terms of transmission rate [11] and case fatality rate (2.3%) [12]. The fatality rate is higher in elderly patients and patients with comorbidities [13]. According to the World Health Organization (WHO), the virus and disease names are respectively, COVID-19 virus and Coronavirus Disease 2019 (CoVID-19) [14], while the International Committee on Taxonomy of Viruses (ICTV) renamed the virus SARS-CoV-2 [15]. According to the WHO ‘situation report 171’ (10 July 2020), a total of 11 874 226 confirmed cases of SARS-CoV-2 were reported, including 545 481 deaths worldwide from 213 countries and/or territories [16]. As the global situation is worsening day by day and there are no appropriate treatment strategies available, there is an urgent demand to develop an effective antiviral/therapeutic drug. SARS-CoV-2 belongs to the virus family Coronaviridae [17] and is 79.5% genome identical to SARS‐CoV [18]. The four structural proteins, spike (S), envelop (E), membrane (M), and nucleocapsid (N) are encoded by the SARS-CoV-2 genome in a similar fashion to that of SARS-CoV [19]. Transmembrane trimeric S protein comprises two functional subunits, S1 and S2 [20]: S1 harbors the binding activity to the receptor of host cells and S2 is responsible for host-viral cell fusion [21]. Entry of SARS-CoV-2 into host cells is almost identical to the binding mode of the SARS-CoV, and the S protein receptor-binding domain (RBD) [22] maintains a high affinity to the human receptor angiotensin-converting enzyme 2 (ACE2) [23, 24]. Cryo-EM structure analysis showed that 319 to 591 of S residues of 2019-nCoV RBD-SD1 fragment have a stable folding conformation with the ACE2 receptor binding [25]. In contrast to SARS-CoV, SARS-CoV-2 has a novel feature at the S1/S2 boundary of S protein with an unexpected furin cleavage site generated through the insertion of 12 nucleotides [23, 25]. In the S1 subunit of SARS-CoV-2, a total of six RBD amino acids (L455, F486, Q493, S494, N501, and Y505) have a critical role for binding to ACE2 receptors [24]. Recent structural analysis revealed highly conserved residues in 2019-nCoV S RBD compared to SARS-CoV S RBD that are critical for ACE2 receptors binding [22]. Therefore, blocking 2019-nCoV S protein from binding to human ACE2 receptors is the most promising target to develop a novel therapeutic drug in the current pandemic. A few studies have evaluated novel therapeutic drugs for SARS-CoV targeting the mode of action of S protein-mediated entry to host cell. Kao et al. reported VE607 as a novel compound that blocks S protein-ACE2-mediated SARS-CoV entry [26]. Structure-based in silico analysis showed N-(2-aminoethyl)-1 aziridine-ethanamine is an inhibitor of SARS-CoVs S protein-mediated cell fusion [27] and SSAA09E2 exhibits a unique mechanism of action that inhibits the early interactions of SARS-S with the receptor for SARS-CoV [28]. In the present study, novel inhibitors of viral attachment have been designed through virtual screening in the quest to combat this deadly virus.

Materials and Methods

Sequence retrieval

The National Center for Biotechnology Information (NCBI) database was explored to find the S glycoprotein of SARS-CoV-2 [29]. Initially, 14 were selected for the study and stored as a FASTA format for further analysis.

Phylogeny and multiple sequence analysis

The BioEdit v7.2.3 sequence alignment editor [30] was used to identify the conserved region among the sequences through multiple-sequence alignment (MSA) with ClustalW [31]. The Jalview v2 tool [32] was used to retrieve the alignment and the CLC Sequence Viewer v7.0.2 (http://www.clcbio.com) was used to analyse the divergence among the different strains of S protein.

Domain and motif analysis

To get insight into the domain structure of the S protein Pfam program of the Sanger Institute [33] and to identify motifs, ScanProsite of the Swiss Institute of Bioinformatics was used [34]. Accession No. QHR63290 was used for this analysis and further proceeded with this protein.

Homology modelling and model validation

Homology models of the target were done by MODELLER v9 [35] and the top-scoring template was selected for model construction. The predicted models were then employed for the validation through the PROCHECK [36] tools of the SWISS-MODEL Workspace [37]. To further check the quality of the predicted model, the superposition between the structure and template was also performed.

Electrostatic potential analysis and energy minimization

The electrostatic potential of the predicted model was built by the APBS server [38] through transforming .pdb to .pqr files and the visualization was done by the PyMOL molecular graphics system [39]. Energy minimization was also done to further secure the validation of the predicted models. The GROMOS 96 [40] force field was used to minimize the structure.

Active site analysis and virtual screening

The active sites of the final proposed model were constructed by the CASTp server. This server builds a model through the pocket algorithm of the alpha shape theory [41]. Screening was initially based on existing literature to get an insight into the potential inhibitors, then the PubChem database [42] was screened for similar hits. All the ligands selected for the docking studies were converted into the .pdbqt format using the Open Bable toolbox [43]. Autodock Vina [44] was used for the screening. The grid generated for molecular docking at the center was X: 228.75, Y: 190.82, and Z: 304.15. The dimensions (Angstrom) used for the docking analysis were X: 26.31, Y: 26.32, and Z: 20.18.

Ligand properties and ADMET analysis

Ligand structural and behavioral properties were analysed for the cross-validation of the ligands as a suitable therapeutic candidate. A wide range of properties was analysed, including Lipinski's Rule [45], drug-likeness, molecular weight, and bioavailability. These parameters were assessed through various online servers and tools, including Molinspiration (http://www.molinspiration.com), Drug-likeness tool [46], and OSIRIS Property Explorer [47] . ADMET properties were assessed for the selected compounds. ADME SARfari [48], admet-SAR [49], Swiss database [50] and QikProp [51] were used for these assessments.

Novel lead design

Novel leads were designed based on the screening results with the aid of PubChem Sketcher V2.4 [52]. The designed structure was then fetched as Canonic SMILES format, and three-dimensional structures were generated by Open Bable toolbox [43]. Ligand properties, molecular docking, and ADMET analysis of the novel compounds were also assessed as described earlier.

Molecular dynamics simulations

To identify the changes in structural conformation, interaction pattern, and binding stability of the ligand-protein complex, YASARA Dynamics software [53] was employed for all molecular dynamics (MD) simulation by using AMBER14 as a force field [54]. The best protein-ligand complex provided by molecular docking computations was selected for further analysis. In this prospect, a simulation cell was generated in a clean and optimized system, where the box size was 90 × 90 × 90 Å3 and the protein-ligand complex placed inside it. The transferable intermolecular potential 3 points (TIP3P) model was applied, and Na/Cl ions were added in place of water to give a total NaCl concentration of 0.9%, where the entire solvent molecules were 132 889, density was 0.997 g/mL, temperature was 298 K and pH was maintained at 7.4 [55, 56]. For better clarity, rendering of water molecules was omitted. The energy minimization of each system was then implemented by a simulated annealing method using the steepest gradient approach (5000 cycles). For long-range electrostatic interactions, Particle Mesh Ewald (PME) method summation was employed with a cut-off value of 8 Å. Subsequently, by using a step-size equivalent to 2.5 fs, the manufacturing MD simulation was operated at YASARA force field level around 100 ns time scale run at a constant pressure and temperature by employing a Berendsen Thermostat and constant pressure [57, 58]. An assessment of the MD trajectories was saved every 25 ps. This trajectory was used further for the analysis to estimate the structural changes and stability employing root mean square deviation (RMSD) and root mean square fluctuation (RMSF) using YASARA structure built-in macros [59].

Results

Phylogenetic analysis of all the strains of the S protein was generated from CLC Sequence Viewer v7.0.2 and is depicted in Figure 1 with an appropriate branch distance. The 1000 bootstrap was used to generate the tree. The multiple sequence alignment generated from the Jalview v2 tool is shown in Figure 2 . This figure shows only the sequence of interest, RBD, in the MSA with appropriate conservation information throughout the sequences.

Figure 1

Phylogenetic tree analysis of the retrieved sequences of spike glycoprotein (S).

Figure 2

Multiple sequence alignment of the selected proteins. Receptor-binding domain (RBD) is shown with appropriate conservation at the bottom (yellow).

Phylogenetic tree analysis of the retrieved sequences of spike glycoprotein (S). Multiple sequence alignment of the selected proteins. Receptor-binding domain (RBD) is shown with appropriate conservation at the bottom (yellow). The major domain and motif organization of the S protein, based on the sequence information from the Pfam and ScanProsite, was constructed and is depicted in Figure 3 . Accession No. QHR63290, with a sequence length of 1273 amino acid residues, was used to generate the construct.

Figure 3

Schematic representation of the major domain and motif structure of the spike glycoprotein (S). The S1 subunit contains NTD (14–305 aa), RBD (319–541 aa), and RBM (437–508 aa). The S2 subunit contains HR1 (912–984 aa), and HR2 (1163–1213 aa).

Homology modelling and validation

A homology model of the targeted protein was constructed through MODELLER and is illustrated in Figure 4 a. The superposition between the template and model is shown in Figure 4 b. The “Ramachandran Plot” for the model validation is depicted in Figure 4 c. The three-dimensional models of RBD and receptor-binding motif (RBM), along with specific amino acid residues, are shown in Figures 4 d and 4 e, respectively.

Figure 4

Homology model, structural evaluation, and domain organization. The three-dimensional model of the targeted protein (surface view) is showed in 4a. In 4b, the superposition between the predicted model (cyan color) and template (blue color) 6VSB_A is presented with RMSD of 0.202. The Ramachandran plot validation of the predicted model, where 95.06% amino-acid residues are in the most favored region, is depicted in 4c. The receptor-binding domain (RBD) (red sphere) (4d) and the receptor-binding motif (RBM) (cyan sphere) (4e) is the main catalytic site for the binding with host surface protein. The amino acid residues of the RBM are shown in 4e (zoomed view).

Electrostatic potentiality and energy minimization

The electrostatic potentiality of the modeled protein was analysed to identify the energy distribution of the protein and is shown in Figure 5 a. The GROMOS 96 force field was applied to the modeled protein for better model construction. The force-field energies of the modeled protein before and after minimization were -44342.816 and -57854.49 kJ/mol, respectively. The energy distribution of the protein after minimization is shown in Figure 5 b.

Figure 5

Electrostatic potentiality of the targeted protein. Energy distributions are shown before minimization in 5a and after minimization in 5b. In the energy distribution, blue color (positive charges) indicates higher energy and red color (negative charges) indicates lower energy potentiality. White color indicates energy neutral status.

Active site exploration and virtual screening

CASTp server predicted different active sites of our preferred protein with different volume scores and the best two large volumes were selected as the final active sites (Figure 6 a and 6 b). The molecular surface areas of the active sites were 31662.48 and 311.61, respectively. The PubChem database, with 103 million depositor-provided chemical substances, was initially explored for the viral attachment inhibitors of the S protein. Based on the potential inhibitor VE607 and SSAA09E2, a total of 415 lead-like compounds were initially screened (Table S1). Thereafter, based on drug-likeness and Lipinski's rules, 46 lead-like compounds were selected for docking, including VE607and SSAA09E2 (Figure S1). For the binding energy of docking poses, RMSD values of 0.0 were selected, and interacting amino acid residues are illustrated in Figure 7 . The docking analysis results are shown in Table 1 .

Figure 6

Figure 7

Molecular docking interaction analysis. All the ligands are docked into the active site pocket of the protein and different ligands are shown by different colors (7a). Interacting amino acid residues of the protein with different ligands (including VE607 and SSAA09E2) (7b). The interaction descriptions are shown graphically in the bottom of 7b.

Table 1

Docking analysis of the 44 screened compounds

Compound No.	PubChem CID	Binding energy (Docking) kcal/mol
1.	10646010	-6.0
2.	10692930	-6.5
3.	10740323	-6.6
4.	10764806	-6.4
5.	10790629	-6.8
6.	11306025	-6.4
7.	11477299	-6.6
8.	11751804	-6.5
9.	13490885	-6.2
10.	13490887	-6.2
11	13490888	-6.7
12.	13747420	-6.1
13.	13747422	-6.6
14.	13747424	-6.1
15.	21261758	-6.5
16.	21261794	-6.7
17.	21261795	-6.1
18.	21261835	-6.8
19.	21261838	-6.4
20.	21261911	-6.7
21.	22132924	-6.7
22.	22132935	-6.5
23.	29977666	-6.4
24.	44394970	-6.5
25.	53755437	-6.1
26.	54477020	-6.6
27.	57826284	-6.4
28.	57826285	-6.9
29.	57826286	-6.8
30.	57835221	-6.3
31.	57835225	-6.7
32.	59981692	-6.5
33.	67047984	-6.2
34.	68116761	-6.5
35.	70242364	-6.1
36.	70402151	-6.8
37.	70438214	-6.7
38.	71379517	-6.3
39.	77856050	-6.6
40.	97289632	-6.5
41.	100951909	-6.5
42.	100951910	-6.0
43.	131708552	-6.4
44.	142969895	-6.8

Active sites of the predicted protein. The large two volumes of the active sites are depicted here. In 6a (blue sphere), is the large cavity of the active site and in 6b (red sphere), the second large volume of the active site cavity is depicted. Molecular docking interaction analysis. All the ligands are docked into the active site pocket of the protein and different ligands are shown by different colors (7a). Interacting amino acid residues of the protein with different ligands (including VE607 and SSAA09E2) (7b). The interaction descriptions are shown graphically in the bottom of 7b. Docking analysis of the 44 screened compounds

Ligand properties and ADMET data analysis

Different properties of the screened compounds, including drug-likeness, Lipinski's rules, aqueous solubility, and bioavailability, were assessed for the identification of the best therapeutic candidates (Table 2 ). All the ADMET properties were also assessed for the selected ligands and cross-checked with different servers and tools for the validity of the results. These properties are described in Table 3 .

Table 2

Ligand properties of the 44 screened compounds

PubChem CID	Molecular Formula	Molecular Weight	Polar Area	xlogp	Hydrogen Bond Donor	Hydrogen Bond Acceptor	Rotable Bonds	Complexity	Heavy Atom Count	Aqueous solubility LogS	Drug-likeness model score	Bio-availability score	Binding energy (docking) kcal/mol
10646010	C25H32N2O4	424.5	78.9	4.71	2	5	9	578	31	-4.47	0.85	0.56	-6.0
10692930	C24H30N2O4	410.5	78.9	4.05	2	5	8	563	30	-4.06	0.81	0.56	-6.5
10740323	C24H30N2O4	410.5	89.9	4.87	3	5	8	563	30	-4.30	1.17	0.56	-6.6
10764806	C25H32N2O4	424.5	78.9	4.52	2	5	9	578	31	-4.31	0.81	0.56	-6.4
10790629	C29H32N2O4	472.6	78.9	5	2	5	9	674	35	-4.53	0.99	0.56	-6.8
11306025	C28H30N2O5	474.5	88.1	4.14	2	6	8	691	35	-4.29	1.11	0.56	-6.4
11477299	C30H34N2O4	486.6	78.9	3.8	2	5	9	703	36	-5.2	1.35	0.56	-6.6
11751804	C29H32N2O4	472.6	78.9	3.61	2	5	8	688	35	-4.78	1.29	0.56	-6.5
13490885	C24H30N2O4	410.5	67.9	4.3	1	5	8	563	30	-4.3	0.51	0.55	-6.2
13490887	C23H28N2O4	396.5	78.9	3.8	2	5	7	549	29	-3.95	0.72	0.56	-6.2
13490888	C28H30N2O4	458.5	78.9	4.81	2	5	8	659	34	-4.47	0.93	0.56	-6.7
13747420	C24H28N2O6	440.5	116	2.60	3	7	9	650	32	-3.02	0.67	0.56	-6.1
13747422	C24H30N2O5	426.5	99.1	3.21	3	6	9	580	31	-3.45	0.75	0.56	-6.6
13747424	C24H28N2O6	440.5	116	3.34	3	7	9	650	32	-3.43	0.9	0.56	-6.1
21261758	C23H27ClN2O4	430.9	78.9	4.39	2	5	8	569	30	-4.59	0.67	0.56	-6.5
21261794	C29H31N2O4-	471.6	81.7	4.93	1	5	8	668	35	-4.72	0.78	0.56	-6.7
21261795	C26H30N2O4	434.5	78.9	4.26	2	5	9	666	32	-4.23	0.74	0.56	-6.1
21261835	C28H30N2O4	458.5	78.9	4.7	2	5	9	659	34	-4.42	0.88	0.56	-6.8
21261838	C23H28N2O4	396.5	78.9	3.73	2	5	8	535	29	-3.73	0.67	0.56	-6.4
21261911	C29H30N2O5	486.6	95.9	4.46	2	6	9	746	36	-4.31	1.39	0.56	-6.7
22132924	C25H32N2O4	424.5	89.9	3.82	3	5	8	590	31	-4.33	1.1	0.56	-6.7
22132935	C27H33N2O4-	449.6	81.7	2.68	1	5	8	666	33	-2.84	0.29	0.56	-6.5
29977666	C22H28N2O4	384.5	102	3.36	3	5	9	511	28	-3.5	0.71	0.56	-6.4
44394970	C26H34N2O4	438.6	78.9	4.95	2	5	9	605	32	-4.39	0.76	0.56	-6.5
53755437	C24H28N2O5	424.5	95.9	3.66	2	6	8	626	31	-3.81	1.4	0.56	-6.1
54477020	C29H34N2O3	458.6	61.8	4.74	2	4	9	602	34	-4.55	0.47	0.55	-6.6
57826284	C22H28N2O4	384.5	102	3.36	3	5	9	511	28	3.5	0.71	0.56	-6.4
57826285	C27H34N2O4	450.6	70.1	3.85	1	5	7	683	33	-4.67	1.42	0.56	-6.9
57826286	C27H34N2O4	450.6	70.1	3.85	1	5	7	683	33	-4.67	1.42	0.56	-6.8
57835221	C28H30N2O5	474.5	88.1	4.14	2	6	8	691	35	-4.29	1.11	0.56	-6.3
57835225	C28H30N2O5	474.5	88.1	4.14	2	6	8	691	35	-4.29	1.11	0.56	-6.7
59981692	C24H30N2O4	410.5	78.9	4.05	2	5	8	563	30	-4.06	0.81	0.56	-6.5
67047984	C25H30N2O4	422.5	78.9	4.34	2	5	9	604	31	-4.25	0.68	0.56	-6.2
68116761	C26H32N2O4	436.5	78.9	4.74	2	5	9	631	32	-4.41	0.81	0.56	-6.5
70242364	C20H22N2O5	370.4	88.1	1.52	2	6	5	529	27	-2.08	0.28	0.55	-6.1
70402151	C26H34N2O4	438.6	78.9	4.77	2	5	9	605	32	-4.41	0.82	0.56	-6.8
70438214	C26H32N2O4	436.5	78.9	4.96	2	5	9	631	32	-4.43	0.79	0.56	-6.7
71379517	C17H17NO4	299.32	75.6	2.56	2	4	6	379	22	-2.98	0.6	0.56	-6.3
77856050	C26H32N2O4	436.5	78.9	4.96	2	5	9	631	32	-4.43	0.79	0.56	-6.6
97289632	C22H28N2O4	384.5	102	3.36	3	5	9	511	28	-3.5	0.71	0.56	-6.5
100951909	C25H32N2O4	424.5	78.9	4.71	2	5	9	578	31	-4.47	0.85	0.56	-6.5
100951910	C25H32N2O4	424.5	78.9	4.71	2	5	9	578	31	-4.47	0.85	0.56	-6.0
131708552	C22H28N2O4	389.5	102	3.36	3	5	9	511	28	-3.5	0.71	0.56	-6.4
142969895	C27H38N2O4	454.6	92.9	3.44	2	5	9	472	33	-5.17	1.1	0.56	-6.8

Table 3

ADMET profile of the 44 screened compounds

PubChem CID	Absorption							Distribution		Metabolism					Excretion		Toxicity
PubChem CID	Blood Brain Barrier	Water solubility	CaCO₂ permeability	Intestinal absorption (human)	Skin Permeability cm/s	P-glycoprotein substrate	P-glycoprotein I inhibitor	BBB permeability	CNS permeability	CYP450 1A2 Inhibitor	CYP2C19 Inhibitor	CYP2C9 inhibitor	CYP2C9 Substrate	CYP3A4 Substrate	Renal OCT2 Inhibitor	AMES toxicity	Carcinogens	Acute Oral Toxicity
10646010	No	-6.69	0.5553	High	-5.78	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6984
10692930	No	-5.21	0.5963	High	-6.06	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7143
10740323	No	-6.00	0.6306	High	-5.53	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6251
10764806	No	-5.76	0.5947	High	-5.77	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7179
10790629	No	-6.56	0.6038	High	-5.52	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7199
11306025	No	-5.49	0.5370	High	-6.40	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6980
11477299	No	-8.74	0.5947	High	-5.35	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7179
11751804	No	-8.35	0.5694	High	-5.52	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7057
13490885	Yes	-6.60	0.5753	High	-6.09	Yes	No	Yes	Yes	No	No	No	No	Yes	No	No	No	0.7305
13490887	Yes	-5.91	0.5665	High	-6.23	Yes	No	Yes	Yes	No	No	No	No	Yes	No	No	No	0.6928
13490888	No	-7.98	0.5813	High	-5.70	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7076
13747420	No	-5.26	0.6441	High	-6.67	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7315
13747422	No	-5.73	0.6479	High	-6.91	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7142
13747424	No	-5.26	0.6717	High	-6.28	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6978
21261758	No	-6.87	0.5555	High	-6.02	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6826
21261794	No	-8.37	0.5464	High	-5.51	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7254
21261795	No	-6.38	0.6026	High	-6.06	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7045
21261835	No	-8.10	0.6000	High	-5.69	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7390
21261838	Yes	-6.28	0.6179	High	-6.26	Yes	No	Yes	Yes	No	No	No	No	Yes	No	No	No	0.7037
21261911	No	-7.90	0.6125	High	-6.02	Yes	No	Yes	Yes	No	No	No	No	Yes	No	No	No	0.7037
22132924	No	-6.02	0.6486	High	-5.31	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6451
22132935	No	-6.74	0.5191	High	-5.43	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7085
29977666	No	-6.08	0.6461	High	-5.74	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6413
44394970	No	-6.71	0.5687	High	-5.55	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7273
53755437	No	-5.83	0.6255	High	-6.45	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6614
54477020	Yes	-8.84	0.5094	High	-5.72	Yes	No	Yes	Yes	No	No	No	No	Yes	No	No	No	0.6749
57826284	No	-6.08	0.6461	High	-5.74	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6413
57826285	Yes	-6.15	0.5666	High	-5.29	Yes	Yes	Yes	Yes	No	No	No	No	Yes	No	No	No	0.7656
57826286	Yes	-6.15	0.5666	High	-5.29	Yes	Yes	Yes	Yes	No	No	No	No	Yes	No	No	No	0.7656
57835221	No	-7.81	0.5370	High	-6.40	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6980
57835225	No	-7.81	0.5370	High	-6.40	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6980
59981692	No	-6.30	0.5963	High	-6.06	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7143
67047984	No	-6.35	0.5643	High	-5.82	Yes	Yes	No	No	No	No	No	No	Yes	No	No	No	0.7072
68116761	No	-6.49	0.6040	High	-5.70	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6547
70242364	No	-4.71	0.5736	High	-9.13	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6228
70402151	No	-6.71	0.5943	High	-5.55	Yes	Yes	No	No	No	No	No	No	Yes	No	No	No	0.7231
70438214	No	-6.36	0.5613	High	-5.74	Yes	Yes	No	No	No	No	No	No	Yes	No	No	No	0.7128
71379517	Yes	-5.21	0.5874	High	-6.31	No	No	Yes	Yes	No	No	Yes	No	Yes	No	No	No	0.7297
77856050	No	-6.36	0.5613	High	-5.74	Yes	Yes	No	No	No	No	No	No	Yes	No	No	No	0.7128
97289632	No	-6.08	0.6461	High	-5.74	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6413
100951909	No	-6.69	0.5553	High	-5.78	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6984
100951910	No	-6.69	0.5553	High	-5.78	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6984
131708552	No	-6.08	0.6461	High	-5.74	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6413
142969895	No	-4.76	0.5731	High	-5.08	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.7289

Ligand properties of the 44 screened compounds ADMET profile of the 44 screened compounds Based on the screening results, novel leads were designed from PubChem Sketcher V2.4 and are shown in Figure 8 a. The ligand properties of the novel drug are shown in Table 4 . Interaction analysis from the molecular docking is illustrated in Figure 8 b and within the active site pocket is in Figure 8 c. The ADMET profile is described in Table 5 .

Figure 8

Designed novel lead and molecular docking interaction analysis. Newly designed novel leads are shown in 8a. All the novel leads are docked into the active site pocket of the protein and different leads are shown by different colors (zoomed view) (8b). Interacting amino acid residues of the proteins with different ligands (8c). The interaction descriptions are shown graphically in the bottom of 8c.

Table 4

Ligand properties of novel compounds along with mother compounds VE607 and SSAA09E2

Lead ID	Molecular Formula	Molecular Weight	Polar Area	xlogp	Hydrogen Bond Donor	Hydrogen Bond Acceptor	Rotable Bonds	Heavy Atom Count	Lipinski's rule violations	Aqueous solubility LogS	Drug-likeness model score	Bio-availability score	Binding energy (docking) kcal/mol
AMTM 01	C28H30N2O5	474.55	110.10	2.88	4	5	8	35	0	-5.80	1.78	0.56	-7.0
AMTM 02	C25H30N2O4	422.52	81.08	3.44	2	4	6	31	0	-5.60	1.79	0.56	-7.5
AMTM 03	C27H28N2O5	460.52	99.10	2.42	3	5	8	34	0	-5.18	1.51	0.56	-7.0
AMTM 04	C27H28N2O7	492.52	139.56	1.53	5	7	8	36	0	-4.18	1.81	0.56	-6.7
VE607	C22H36N2O4	392.53	65.40	1.33	2	6	10	28	1	-3.28	0.80	0.55	-6.1
SSAA09E2	C16H20N4O2	300.16	53.72	0.75	1	4	5	22	0	-4.43	0.97	0.55	-6.7

Table 5

ADMET profile of novel compounds along with mother compounds VE607 and SSAA09E2

Lead ID	Absorption							Distribution		Metabolism					Excretion		Toxicity
Lead ID	Blood Brain Barrier	Water solubility	CaCO₂ permeability	Intestinal absorption (human)	Skin Permeability cm/s	P-glycoprotein substrate	P-glycoprotein I inhibitor	BBB permeability	CNS permeability	CYP450 1A2 Inhibitor	CYP2C19 Inhibitor	CYP2C9 inhibitor	CYP2C9 Substrate	CYP3A4 Substrate	Renal OCT2 Inhibitor	AMES toxicity	Carcinogens	Acute Oral Toxicity
AMTM 01	No	-7.07	0.6671	High	-5.62	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6329
AMTM 02	No	-5.07	0.6292	High	-5.21	Yes	No	No	No	No	Yes	No	No	Yes	No	No	No	0.6958
AMTM 03	No	-7.12	0.6353	High	-6.16	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.6134
AMTM 04	No	-5.71	0.6403	High	-6.97	Yes	No	No	No	No	No	No	No	Yes	No	No	No	0.5953
VE607	Yes	-3.82	0.6554	High	-7.00	Yes	Yes	Yes	Yes	No	No	No	No	No	No	No	No	0.6960
SSAA09E2	No	-4.32	0.5294	High	-7.13	Yes	Yes	No	No	No	No	No	No	No	No	No	No	0.7217

Molecular dynamics simulation analysis

To understand the stability, flexibility, structural behavior, and binding mechanism of the best protein-ligand complex along with the apo form of protein, 100 ns of molecular dynamics simulations were conducted. Here, we considered our top designed novel compound, AMTM-02, with the S protein. The atomic RMSDs of the protein-ligand complex along with the apo form of protein were calculated and are plotted in a time-dependent manner in Figure 9 a, where the apo form and the protein-ligand complex are indicated with the red and blue color, respectively. According to Figure 9 a, these two indexes, i.e. the apo form of protein and the protein-ligand complex, regularly fluctuate around a constant value. The apo form of protein starts its journey with 0.592 Å RMSD value at 0 ns and gradually increases up to 29.65 ns. In 29.65 ns, it shows its highest peak with the RMSD value of 8.141 Å. From 29.65 to 39.575 ns it shows an average RMSD value of 7.01 Å and starts gradually decreasing its peak up to 43.6 ns with 4.9 Å. However, after 43.6 ns it has completed its full 100 ns MD simulation journey along with lots of ups and downs with the average RMSD value of 5.5 Å.

Figure 9

Molecular dynamics simulations of the targeted protein and the protein–ligand (AMTM-02) complex. In 9a, the time series of the RMSD of backbone atoms (C, Cα, and N) for protein. Here, red and blue lines denote Apo_QHR63290, and QHR63290-Ligand complex, respectively. In 9b, the structural changes of protein using root means square fluctuations (RMSF) analysis. Here, red, and blue lines denote Apo_QHR63290, and QHR63290-Ligand complex, respectively. The looped structures of the fluctuated residues are represented in the extended view. On the other hand, the protein-ligand complex starts with 0.548 Å RMSD value and at the initial stage around 7 ns of simulation, it reached its highest RMSD magnitude with 7.992 Å. After 15 ns it became stable and remained until 25 ns showing an average 6 Å RMSD value, its stability decreased to 4 Å and again became stable at 43 ns to 61 ns with an RMSD value of 6 Å. After 63 ns, it showed a significant fluctuation from 7.94 Å to 6 Å and continued at the end of the simulation. To understand the flexible region of the protein and the movement of each residue, the RMSF value of the amino acid sequence of the respected protein complex was explored along with apo protein where lower RMSF value indicates less variability. The apo form showed lower RMSF values ranging from 1 Å to 6 Å (average 2 Å), whereas the protein-ligand complex showed higher RMSF values ranging from 8 Å to 11 Å as shown in Figure 9 b. The protein-ligand complex showed slightly higher fluctuations at 450-500 residues compared with apo.

Discussion

With the global progression of SARS-CoV-2, the death rate is surprisingly high. Scientific communities are working hard to develop vaccines or drugs to combat this issue [19, [60], [61], [62], [63]]. Computational biology approaches are very promising for treating many severe diseases through projecting the in vitro validations [64], [65], [66], [67], [68]. The S glycoprotein is one of the most significant targets for designing therapeutics against CoV and is thought to be more prone to mutations [69]. In the current study, 14 S proteins were retrieved, and phylogenetic study showed they were closely related (Figure 1). Furthermore, the MSA study revealed that the RBD sequences from different strains were mostly conserved (Figure 2), thus supporting the choice of target for therapeutic development. The RBM within the RBD (Figure 3) is crucial for binding with host cell receptor and is typically a tyrosine-rich region [70]. With 95.06% amino-acid residues in the most favorable region, validation of the homology model through ‘Ramachandran Plot’ favors a comparatively good model. Additionally, the superposition RMSD value (0.202) between the predicted structure and template supports the validity of the structure. The three-dimensional representation of the RBD and RBM also helps give insight into the attachment site of the ACE2 (Figure 4). The electrostatic potential of the protein revealed its charges throughout the surfaces and the positive charges (blue) in the RBM region indicate the hydrophobicity of that region (Figure 5). Active site analysis of the targeted protein is an important part of drug design approaches. As we are looking for attachment inhibitors, we selected the second-largest volume of the active site having the surface area of 311.61 (Figure 7 b). Extensive literature review showed that VE607 and SSAA09E2 were the most reported attachment inhibitors of the SARS-CoV S protein. Therefore, 415 lead-like compounds were screened from the PubChem database. Comparative analysis showed that most (44) of the docked compounds were potent inhibitors of the viral attachment of SARS-CoV-2 (Table 2, Table 3 and Figure 8). Finally, in this study, novel inhibitors were designed based on previous findings and four novel compounds (AMJM 01 to 04) were found to have optimistic lead-like properties compared with VE607 and SSAA09E2. Considering the ligand properties, VE607 violates one of Lipinski's rules of five as it has 10 rotatable bonds. The ADMET properties indicate that VE607 may permeate the blood-brain barrier (BBB), which might be toxic to the central nervous system (CNS). In addition, both VE607 and SSAA09E2 have P-glycoprotein-inhibitor properties, which may lead to drug accumulation-related toxicity. The binding energies with the protein for VE607 and SSAA09E2 were -6.1 and -6.7 kcal/mol, respectively. In the case of novel compounds, the binding energy ranged from -6.7 to -7.5 kcal/mol (Table 04). The most striking feature of the novel compounds was the drug-likeness property. AMTM 02 and AMTM 04 possess the highest drug-likeness score of 1.79 and 1.81, respectively. However, for VE607 and SSAA09E2, the drug-likeness scores were 0.80 and 0.97, respectively. In the case of oral bioavailability, all the novel compounds had a score of 0.56, and VE607 and SSAA09E2 had a score of 0.55. Finally, the MD simulation study of the targeted protein and the novel designed compound AMTM 02 further strengthened our prediction through validating the complex interaction stability presented by RMSD value (Figure 9 a). The RMSF values of the single protein and the complex have also been observed, and few higher fluctuations were observed in the looped regions of the protein (Figure 9 b). The proposed novel compounds might have some significant outputs for drug discovery still need subsequent assessments in vitro and in vivo.

Conclusion

The biggest current global challenge is to escape from the SARS-CoV-2-related pandemic that has arisen because of the inadequacy of effective therapeutic drugs or vaccines. The designed novel inhibitors in the present study for the S protein of SARS-CoV-2 not only showed the higher characteristics in terms of drug-likeness properties but also have the best binding affinities to the target protein thus potentially blocking viral attachment with host cells. However, these analyses require several in vitro and in vivo validations before formulating the drug to resist SARS-CoV-2.

Declaration of Competing Interests

The authors declare no conflict of interest.

59 in total

1. Characterization of a novel coronavirus associated with severe acute respiratory syndrome.

Authors: Paul A Rota; M Steven Oberste; Stephan S Monroe; W Allan Nix; Ray Campagnoli; Joseph P Icenogle; Silvia Peñaranda; Bettina Bankamp; Kaija Maher; Min-Hsin Chen; Suxiong Tong; Azaibi Tamin; Luis Lowe; Michael Frace; Joseph L DeRisi; Qi Chen; David Wang; Dean D Erdman; Teresa C T Peret; Cara Burns; Thomas G Ksiazek; Pierre E Rollin; Anthony Sanchez; Stephanie Liffick; Brian Holloway; Josef Limor; Karen McCaustland; Melissa Olsen-Rasmussen; Ron Fouchier; Stephan Günther; Albert D M E Osterhaus; Christian Drosten; Mark A Pallansch; Larry J Anderson; William J Bellini
Journal: Science Date: 2003-05-01 Impact factor: 47.728

2. Making optimal use of empirical energy functions: force-field parameterization in crystal space.

Authors: Elmar Krieger; Tom Darden; Sander B Nabuurs; Alexei Finkelstein; Gert Vriend
Journal: Proteins Date: 2004-12-01

3. The PubChem chemical structure sketcher.

Authors: Wolf D Ihlenfeldt; Evan E Bolton; Stephen H Bryant
Journal: J Cheminform Date: 2009-12-17 Impact factor: 5.514

4. Bioinformatic identification and experimental validation of miRNAs from foxtail millet (Setaria italica).

Authors: Jun Han; Hao Xie; Qingpeng Sun; Jun Wang; Min Lu; Weixiang Wang; Erhu Guo; Jinbao Pan
Journal: Gene Date: 2014-05-24 Impact factor: 3.688

5. Identification of potential drug targets and inhibitor of the pathogenic bacteria Shigella flexneri 2a through the subtractive genomic approach.

Authors: Arafat Rahman Oany; Mamun Mia; Tahmina Pervin; Md Nazmul Hasan; Akinori Hirashima
Journal: In Silico Pharmacol Date: 2018-06-04

6. Design of an epitope-based peptide vaccine against spike protein of human coronavirus: an in silico approach.

Authors: Arafat Rahman Oany; Abdullah-Al Emran; Tahmina Pervin Jyoti
Journal: Drug Des Devel Ther Date: 2014-08-21 Impact factor: 4.162

7. Database Resources of the National Center for Biotechnology Information.

Authors:
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

8. PubChem Substance and Compound databases.

Authors: Sunghwan Kim; Paul A Thiessen; Evan E Bolton; Jie Chen; Gang Fu; Asta Gindulyte; Lianyi Han; Jane He; Siqian He; Benjamin A Shoemaker; Jiyao Wang; Bo Yu; Jian Zhang; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2015-09-22 Impact factor: 16.971

9. Identification of novel small-molecule inhibitors of severe acute respiratory syndrome-associated coronavirus by chemical genetics.

Authors: Richard Y Kao; Wayne H W Tsui; Terri S W Lee; Julian A Tanner; Rory M Watt; Jian-Dong Huang; Lihong Hu; Guanhua Chen; Zhiwei Chen; Linqi Zhang; Tian He; Kwok-Hung Chan; Herman Tse; Amanda P C To; Louisa W Y Ng; Bonnie C W Wong; Hoi-Wah Tsoi; Dan Yang; David D Ho; Kwok-Yung Yuen
Journal: Chem Biol Date: 2004-09

10. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention.

Authors: Zunyou Wu; Jennifer M McGoogan
Journal: JAMA Date: 2020-04-07 Impact factor: 56.272

8 in total

1. Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19.

Authors: Md Rabiul Auwul; Md Rezanur Rahman; Esra Gov; Md Shahjaman; Mohammad Ali Moni
Journal: Brief Bioinform Date: 2021-04-12 Impact factor: 11.622

Review 2. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors: Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal: Chem Rev Date: 2022-05-20 Impact factor: 72.087

3. Bioinformatics and system biology approach to identify the influences of COVID-19 on cardiovascular and hypertensive comorbidities.

Authors: Asif Nashiry; Shauli Sarmin Sumi; Salequl Islam; Julian M W Quinn; Mohammad Ali Moni
Journal: Brief Bioinform Date: 2021-03-22 Impact factor: 11.622

4. Integrative Systems Biology Approaches to Identify Potential Biomarkers and Pathways of Cervical Cancer.

Authors: Arafat Rahman Oany; Mamun Mia; Tahmina Pervin; Salem Ali Alyami; Mohammad Ali Moni
Journal: J Pers Med Date: 2021-04-30

Review 5. Current Strategies of Antiviral Drug Discovery for COVID-19.

Authors: Miao Mei; Xu Tan
Journal: Front Mol Biosci Date: 2021-05-13

6. Vaccinomics approach for scheming potential epitope-based peptide vaccine by targeting l-protein of Marburg virus.

Authors: Tahmina Pervin; Arafat Rahman Oany
Journal: In Silico Pharmacol Date: 2021-03-06

7. Drug Repurposing Approach against Novel Coronavirus Disease (COVID-19) through Virtual Screening Targeting SARS-CoV-2 Main Protease.

Authors: Kamrul Hasan Chowdhury; Md Riad Chowdhury; Shafi Mahmud; Abu Montakim Tareq; Nujhat Binte Hanif; Naureen Banu; A S M Ali Reza; Talha Bin Emran; Jesus Simal-Gandara
Journal: Biology (Basel) Date: 2020-12-23

8. Effects of Bacille Calmette Guerin (BCG) vaccination during COVID-19 infection.

Authors: Utpala Nanda Chowdhury; Md Omar Faruqe; Md Mehedy; Shamim Ahmad; M Babul Islam; Watshara Shoombuatong; A K M Azad; Mohammad Ali Moni
Journal: Comput Biol Med Date: 2021-09-29 Impact factor: 4.589

8 in total