Gourab Das1, Troyee Das2, Nilkanta Chowdhury3, Durbadal Chatterjee4, Angshuman Bagchi5, Zhumur Ghosh6. 1. Division of Bioinformatics, Bose Institute, P-1/12, CIT Scheme VIIM, Kankurgachi, Kolkata 700 054, India. Electronic address: gourab4gd@gmail.com. 2. Division of Bioinformatics, Bose Institute, P-1/12, CIT Scheme VIIM, Kankurgachi, Kolkata 700 054, India. Electronic address: troyee@jcbose.ac.in. 3. Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia 741235, West Bengal, India. Electronic address: nil5two@gmail.com. 4. Division of Bioinformatics, Bose Institute, P-1/12, CIT Scheme VIIM, Kankurgachi, Kolkata 700 054, India. Electronic address: cdurbadal@gmail.com. 5. Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia 741235, West Bengal, India. Electronic address: angshu@klyuniv.ac.in. 6. Division of Bioinformatics, Bose Institute, P-1/12, CIT Scheme VIIM, Kankurgachi, Kolkata 700 054, India. Electronic address: zhumur@jcbose.ac.in.
Abstract
COVID-19 pandemic caused by SARS-CoV-2 has already claimed millions of lives worldwide due to the absence of a suitable anti-viral therapy. The CoV envelope (E) protein, which has not received much attention so far, is a 75 amino acid long integral membrane protein involved in assembly and release of the virus inside the host. Here we have used artificial intelligence (AI) and pattern recognition techniques for initial screening of FDA approved pharmaceuticals and nutraceuticals to target this E protein. Subsequently, molecular docking simulations have been performed between the ligands and target protein to screen a set of 9 ligand molecules. Finally, we have provided detailed insight into their mechanisms of action related to the varied symptoms of infected patients.
COVID-19 pandemic caused by SARS-CoV-2 has already claimed millions of lives worldwide due to the absence of a suitable anti-viral therapy. The CoV envelope (E) protein, which has not received much attention so far, is a 75 amino acid long integral membrane protein involved in assembly and release of the virus inside the host. Here we have used artificial intelligence (AI) and pattern recognition techniques for initial screening of FDA approved pharmaceuticals and nutraceuticals to target this E protein. Subsequently, molecular docking simulations have been performed between the ligands and target protein to screen a set of 9 ligand molecules. Finally, we have provided detailed insight into their mechanisms of action related to the varied symptoms of infectedpatients.
Keywords:
AI based screening; Drug designing; Drug repurposing; E protein; Molecular docking simulations; Nutraceuticals; Pattern recognition; SARS-CoV-2 infection
SARS-CoV-2, as named by the World Health Organization (WHO) to be responsible for causing Corona Virus Disease 2019 (COVID-19), was declared as pandemic on March 2020 [1]. Till date there has been more than 17 million confirmed cases worldwide claiming lives of more than 650,000 people [2]. The causative agent of the pandemic is a positive stranded enveloped virus having a genome size of ~30 Kb and belonging to genus β-CoVs of Coronaviridae family [3]. The virus shows the highest sequence similarity as well as disease symptoms with SARS-COV [4], which emerged as a disease of zoonotic origin in 2003, reporting Acute Respiratory Distress Syndrome (ARDS) and pneumonia among patients [5]. However, SARS-CoV-2 has higher transmission rate as compared to SARS-CoV and thus emerged as the cause of the worldwide pandemic [6].SARS-CoV-2 is made up of 14 ORFs together encoding for 27 proteins [7]. These include four major structural proteins, viz., the spike (S) protein, nucleocapsid (N) protein, membrane (M) protein, and the envelope (E) protein) [8].Apart from the structural make up, involvement of these structural proteins in various aspects of viral life cycle and the role of the non-structural proteins in pathogenesis have garnered attention [[9], [10], [11]]. Drug repurposing against these SARS-CoV-2 targets is a promising approach to combat the viral infection [[12], [13], [14], [15]]. Majority of the drug designing efforts have been directed towards the viral Spike glycoprotein to disrupt host ACE2 receptor recognition and binding so as to prevent viral entry inside the human host [[16], [17], [18], [19]].On the contrary, among the other structural proteins of SARS-CoV-2 which have been less investigated is the E protein which is the smallest but also the most intriguing one due to its highest sequence conservation with that of SARS-CoV [8]. We therefore have attempted to target the viral E-protein, as a therapeutic intervention against the disease.The E protein of SARS-CoV-2 shares high amino acid conservation with that of SARS-CoV. The E protein of SARS-CoV is a short, integral membrane protein of 76–109 amino acid residues with its size varying between 8.4 and 12 kDa [20]. It has a short, hydrophilic amino terminus consisting of 7–12 amino acids preceding a large hydrophobic transmembrane domain (TMD) of 25 amino acids [21]. The presence of two neutral, non-polar amino acids, valine and leucine in the TMD domain makes it strongly hydrophobic [22]. The protein terminates with a long, hydrophilic carboxyl terminus [23]. TMD is also found to oligomerize forming a pentameric structure by self-interaction which demarcates an ion conducting hydrophilic pore viroporin [24,25]. The major portion of the protein is believed to get incorporated in the Golgi, ER and ERGIC membranes in their pentameric form where positively charged, basic amino acids (viz. lysine or arginine) of the protein affix the pore to the negatively charged membrane phospholipids through electrostatic interactions [26].It is also suggested that E protein aids in viral assembly, morphogenesis, budding and pathogenesis [27]. Deletion of E protein exhibits formation of significantly crippled viruses though the replication efficiency of the virus is not hampered [28]. Moreover, its role in pathogenesis is implicated through various modes of interactions with the host proteins [29]. Redistribution of syntenin from nucleus to cytoplasm by E protein results in the activation of p38 MAPK, triggering an over-production of pro-inflammatory cytokines like IL-1β, resulting in edema and other characteristic symptoms of ARDS [30]. Interaction between E and Bcl-xL protein is suspected to induce SARS characteristic lymphopenia [31].Knocking down viroporin activity showed reduction in IL-1β production and edema accumulation, restoration of lung epithelial wall integrity and proper functioning of Na+/K+ ATPase [32].The point mutations in TMD of E-protein are N15A and V25F, could suppress viroporin activity. Particularly V25F results in disruption of pentameric structure which is a crucial evidence for its role in oligomerisation [33]. Interestingly, in a mice model experiment by Nieto-Torres et al., N15A and V25F mutant viruses incorporated compensatory mutations either in these two positions or in close proximities within the TMD domain to render the ion channel active. Those mutations are N15A ➔ A15D and V25F ➔ F25C, L19A, F20 L, F26 L, L27S, T30I, L37R. Only those mice were able to survive and recover which did not incorporate these mutations [32].This put forward the immense importance of these amino acids spanning region between 15th to 37th position of E-protein in regulating ion channel activity. Thus, we hypothesize that target inhibition of TMD of E-protein, particularly blocking the amino acids playing the roles in viroporin oligomerization could inhibit the pentamer formation and associated pathogenic reactions.The name of several drugs and phytochemicals have come forward to target against the pentameric structure of E-protein, though very few of them are FDA approved [34]. Here, we aim to combat the spreadability of the virus at the early stage of viral E-Protein synthesis in its monomeric form, before oligomerization and initiation of viroporin pathogenesis.In this work, we have adopted artificial intelligence (AI) based deep learning and pattern recognition techniques to screen out FDA approved drugs and food supplements which can target E-protein. Our main focus is to target the protein region between 15th to 37th residue, responsible for modulating channel activity. Computational molecular docking based virtual screening was performed thereafter for final selection of the drugs. Further, we have provided detailed insight into their mechanisms of action that could serve as an easy remedial reference for developing therapeutic measures against SARS-CoV-2.
Materials and methods
Sequence analysis of E protein
Human SARS CoV E protein sequence has been downloaded from UniProt(ID:P59637) [35]. It is 76 amino acid residues spanning 26,117 bps to 26,347 bps of the SARS-CoV genome. Human SARS-CoV-2 E protein sequence of 75 amino acids (265245–26,472, 228 bases) has been downloaded from NCBI protein database (ID: YP_009724392.1) [36]. Protein sequence of the three other structural proteins i.e. Matrix (M), Nucleocapsid (N) and Spike (S) proteins of SARS-CoV and SARS-CoV-2 have been downloaded from UniProt and NCBI protein bank respectively. Sequence alignment has been performed with ClustalW in Jalview [37]. In order to check for sequence conservation among SARS-CoV-2 E proteins, 12,087 SARS-CoV-2 genome sequences have been downloaded from GSAID portal(https://www.gisaid.org/) till 26.04.2020 [38]. Custom scripts have been used to analyze sequence conservation after removing partially sequenced genomes.
Structure of E protein
The three-dimensional structure of human SARS CoV Envelop (E) protein has been obtained from the Swiss-Model [39] repository, (https://swissmodel.expasy.org/repository/uniprot/P0DTC4) on 14.04.2020. The structure of the E protein obtained from Swiss-Model repository is a homopentamer. As it is a membrane protein, it has a patch of hydrophobic region. In this pentamer, each monomer consists of Coil (18.97%), Turn (20.68%) and Helix (60.35%) (Fig. 1
). In this work, we have attempted to target the monomeric structure of the E-protein so as to prevent the subsequent formation of the active pentameric structure of the E-protein. This would in turn hamper the viral activity within human host. From the pentameric structure (obtained from Swiss-Model repository), the structure of the monomeric unit of E-protein was extracted using Discovery Studio 2.5. The monomeric structure of E protein is shown in Fig. 1A. In this figure, the blue region, marked with an arrow, is our zone of interest. This area lies between Asn15 to Leu37. This zone is highly responsible for the protein-protein interactions (PPI) during the pentameric formation of E protein. Fig. 1B and C are showing the top view and the side view of the pentamer respectively. Fig. 1D and E showing the hydrophobic surface view of the pentamer from top and side respectively.
Fig. 1
Structures of human SARS-CoV E-protein.
A) Monomeric structure of E-protein, the arrow showing the zone of interest involved in protein-protein interactions.
B) Pentameric structure of E-protein, top view.
C) Pentameric structure of E-protein, side view.
D) Hydrophobic surface view of the pentamer from the top.
E) Hydrophobic surface view of the pentamer from the side.
Structures of human SARS-CoV E-protein.A) Monomeric structure of E-protein, the arrow showing the zone of interest involved in protein-protein interactions.B) Pentameric structure of E-protein, top view.C) Pentameric structure of E-protein, side view.D) Hydrophobic surface view of the pentamer from the top.E) Hydrophobic surface view of the pentamer from the side.
Drug dataset and pre-processing
Multiple open source drug databases have been considered for this work. Primary drug database includes ChEMBL v.26 [40,41](https://www.ebi.ac.uk/chembl/). Besides this we have utilized Enamine Bio reference Compounds (https://www.enaminestore.com/products/bioreference-compounds) and Chemoinformatic tools and database (https://chemoinfo.ipmc.cnrs.fr/TMP/tmp.32396/e-Drug3D_1930_v3.sdf) as secondary databases. These sdf files are converted into canonical SMILES format using RDKit tools (https://www.rdkit.org/). After removal of duplicate drugs, 12,715 drugs were retained to be processed further.
In-silico screening of selected drugs
Artificial Intelligence (AI) based deep learning and pattern recognition techniques were used for initial screening and selection of suitable ligands to target the monomeric structure of the viral E-protein. The aforementioned techniques require identification of some specific patterns in the ligands derived from the sequence and structural features of the target E-protein.(a) AI based in-silico screening method: For this, we first examined the different features of the viral E-protein and extracted conserved characteristic features from it. Such feature extraction was greatly facilitated by the absolute amino acid sequence conservation of the E-protein among the different viral strains from all over the globe.We considered the following features of the E-protein:The amino acid sequence length- The monomeric structure of the E-protein is 75 amino acid residues long. Hence, the targeting ligand should be commensurate with this size of the protein.The domains of the E-protein- The protein has trans-membrane domain spanning the amino acid residues running between position 15 to 37. Therefore, the targeting ligands must have sufficient hydrophobic patches in order to cover this region.Using these criteria, we downloaded the ligand datasets from ChEMBL v.26, Enamine Bio reference Compounds and Chemoinformatic tools & database in sdf format and converted them to canonical SMILES format by RDKit. The propensities of the ligands to interact with the target were analyzed by AI based deep learning models IVS2vec [42].IVS2vec model takes 2 types of input in forms of two individual numeric vectors. One input is the protein molecule and the other one is the ligand molecule. Mol2vec [43] (which works on the principle of Word2vec [44] model) converts the SMILES formatted molecules into 300-dimensional numeric vectors. Pre-trained IVS2vec model performs the prediction task upon 600 dimensional vectors of targeted protein and molecular ligands. The structure of this deep learning based model consists of Dense Fully Connected Neural Network (DFCNN) that is already trained with PDBbind [45] database. Prediction task is executed on 2 regions of monomeric structure of the E-protein. One region considered all the amino acids constituting the monomeric structure and another one consisted of the amino acids from 15th up to 37th position of the E-protein, as it's known for its transmembrane domain which is one of the main functional sites of the targeted protein. The set of drugs having score of binding > 85% for both these regions of the protein have been selected.(b) Pattern recognition based in-silico screening technique: Beside AI, we have also considered pattern recognition techniques for filtering important drugs. In order to select a suitable drug candidate to target this monomeric structure of E-protein, we used the following two structural features:Since this protein has a membrane spanning region, the candidate drug molecule must contain a long hydrophobic tail to target the membrane spanning region of the E-protein. Based upon this property we have searched for the candidate ligands having long aliphatic carbon chains of ≥16 carbon atoms which have the greater probability to bind to the membrane spanning region [46].The region of the protein spanning the amino acid residues 15 to 37 is known to be involved in oligomerization of the protein to make a viable and active pentameric structure necessary for exerting its activity in the host cell. The aforementioned region contains 3 aromatic Phe residues (at position 20, 23 and 26). Therefore, the candidate drug molecule would also need to have aromatic ring structures in order to fit into the three-dimensional space occupied by the said region of E-protein in such a way that oligomerization process would no longer take place. Thus, we have searched for drug molecules containing 2 aromatic rings.The set of drugs obtained after imposing screening techniques (a) and (b) are further screened based on their (i) existing functions (i.e. to check how far its existing functionality correlates with the symptoms of SARS-CoV-2 specially in the context of the function of E-protein); (ii) FDA approval status and (iii) severity of the reported side effects of these drugs.The set of drugs obtained after all these screening phases have been considered for subsequent molecular docking.
Molecular docking based virtual screening
The screened ligands obtained above were then subjected to computational molecular docking based virtual screening. From the SMILES information of the ligands, their three-dimensional structures have been built and then the ligand structure optimization has been performed using Discovery Studio 2.5. We used Autodock VINA [47] for this computational molecular docking. In order to get an unbiased docking result, blind docking has been performed to check the affinity of these ligands towards the zone of protein-protein interactions (PPI).Here one of the important aspects we have taken care of, while looking into the binding affinity between the ligand and the target protein is the correlation between the binding affinity and docking score. It has been observed that binding affinity between the targeted protein and ligand cannot be judged solely by docking score [48] as they are poorly correlated [49,50]. Orientation of binding [48], site of binding [51] and type of binding energy [50] are also important aspects which need to be considered to judge binding affinity. On the contrary, docking score is highly correlated with the molecular weight of both the ligand and target protein [48]. Hence, higher the molecular weight of the ligand and target protein more will be the docking score and vice-versa.In this work, we have targeted monomeric structure of E-protein for SARS-CoV-2 constituting 75 amino acids only. For strong interaction with small protein, ligands with length ≤ 100 in terms of SMILES symbol are considered for docking [52,53] as large molecules may fail to invade the target biologically. As specified above, interaction potential between protein and ligand can be inferred efficiently by considering structure based features instead of ligand based features [54]. Further, Guedes et al. [51], specified that binding affinity depends on the site of binding. In this case, TMD of the E-protein is mostly crucial and responsible for the virulence factor of SARS-CoV-2. We have implemented blind docking, and drugs inhibiting that specific TMD site would reduce the pathogenicity of the virus. The entire workflow is provided in Fig. 2
.
Fig. 2
Workflow showing step wise screening of potential drugs and nutraceuticals targeting SARS-CoV2 E-Protein.
+: shows the adding up of molecules screened by multiple pattern recognition approaches.
U: shows the union of molecules screened both by AI based and Pattern recognition approaches.
Workflow showing step wise screening of potential drugs and nutraceuticals targeting SARS-CoV2 E-Protein.+: shows the adding up of molecules screened by multiple pattern recognition approaches.U: shows the union of molecules screened both by AI based and Pattern recognition approaches.
Results
Sequence conservation of E-protein at amino acid level and nucleotide
The pairwise nucleotide sequence alignment between SARS-CoV-2 and SARS-CoV has been represented in Fig. 3 along with the consensus sequence. Protein sequence conservation score of all four structural proteins are provided in Table 1. We observed that E-protein shares the highest sequence similarity among all structural proteins when compared between SARS-CoV-2 and SARS-CoV. The details of pairwise alignment of all the structural proteins between the two strains have been provided in Supplementary File SF1.
Fig. 3
Amino acid sequence alignment and conservation pattern of SARS-CoV and SARS-CoV-2.
Transmembrane (TMD) domain between position15 to position 37 has been highlighted in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 1
Similarity between SARS-CoV andSARS-CoV-2 structural proteins at protein level.
Structural Protein
NCBI protein bank accession No. SARS-CoV-2
Length SARS-CoV-2
UniProt Accession No. SARS-CoV
Length SARS-CoV
Percentage similarity between SARS-CoV-2 and SARS-CoV protein
E (Envelope)
P59637
75
YP_009724392.1
76
94.74
N (Nucleoprotein)
P59595
419
QIH45060.1
422
90.52
M (Matrix)
P59596
222
QII57163.1
223
90.45
S (Spike)
P59594
1273
YP_009724390.1
1255
76.27
Such highly conserved nature of E-protein between the previous and current strains of the virus instigated us to target this protein while looking for the possible therapeutic interventions against the virus.Amino acid sequence alignment and conservation pattern of SARS-CoV and SARS-CoV-2.Transmembrane (TMD) domain between position15 to position 37 has been highlighted in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)Similarity between SARS-CoV andSARS-CoV-2 structural proteins at protein level.Our next step was to look into the degree of similarity between the SARS-CoV-2 nucleotide sequences collected from different patient samples at different geographical locations so as to estimate its conserved nature in due course of the spread of the virus to different geographical locations across the globe. Among 11,943 sequences (obtained from GSAID) that have been considered after removing partially sequenced sequences, 97.22% of them showed 100% conservation of E-protein at nucleotide level. Among the remaining sequences with E-protein mutations, those having sequencing errors with NNNN reads were not considered. 44% of the mutated E-protein sequences showed conservations among the entire 69 bases of interest; spanning amino acids from 15 to 37th position of the TMD domain. Sequence conservation information has been provided in Table 2
.
Table 2
Conservation among SARS-CoV-2 E-protein sequences at nucleotide level obtained from different patients worldwide.
Sequence downloaded from GSAID
12,087
Partial sequences removed
144
Sequence analyzed
11,943
E-protein conservation
11,612 (97.23%)
Total Mutated E sequences
331 (2.77%)
Sequences discarded due to sequencing errors
111
Mutated sequences considered
220
Conservation of core 69 bases from position 15 to 37
97 (44%)
Mutation in the core 69 bases from position 15 to 37
123 (56%)
Conservation among SARS-CoV-2 E-protein sequences at nucleotide level obtained from different patients worldwide.
Screening of drugs using computational methods
As mentioned in Materials and Methods section, various computational techniques like AI based deep learning and pattern recognition techniques have been implemented to screen a set of drugs from the initial list of drugs obtained from ChEMBL v.26 (https://www.ebi.ac.uk/chembl/), Enamine Bio reference Compounds (https://www.enaminestore.com/products/bioreference-compounds) and Chemoinformatic tools and database (https://chemoinfo.ipmc.cnrs.fr/TMP/tmp.32396/e-Drug3D_1930_v3.sdf). These set of drugs are further screened based on their existing functions, FDA approval and severity of their side effects. Table 3
enlist the full set of drugs which were sent out for docking.
Table 3
List of drugs selected for molecular docking.
Screening approach
Drugs Name
Artificial Intelligence
Cinametic Acid
Dodecanoic Acid/ Lauric Acid
Salmeterol
Guaifenesin
Mexiletine
Midorine
Benzyl Benzoate
Ricinoleic Acid
Capsaicin
Pterostilbene
Artificial Intelligence and Pattern Recognition on Aromatic Carbon
Nabumetone
Naproxen
Artificial Intelligence and Pattern Recognition on Long Aliphatic Carbon Chain
Palmidrol
Ascorbyl Palmitate
Cetalkonium
Pattern Recognition on Aromatic Carbon
Nafcillin
Pattern Recognition on Long Aliphatic Carbon Chain
Octacosanol
Vitamin A Palmitate
List of drugs selected for molecular docking.
Screening of drugs using AI based Machine Learning technique
The initial list consisted of 12,715 drugs. Among those, 395 were screened as they fulfilled 2 properties (i) Their length is ≤100 in terms of SMILES string format and (ii) Their binding score with all alpha carbon atoms as well as with carbon atoms of E-protein spanning between 15-37th position is >85% using IVS2vec model. Subsequent to AI based screening, the drugs are further filtered, based on existing functions correlating with the symptoms of SARS-CoV-2, FDA approval status and severity of the reported side effects of these drugs. Finally, 18 drugs have been screened by this method (Table 3).
Screening of drugs based on pattern recognition
Long carbon chain based pattern recognition
E-protein consists of membrane spanning region with long hydrophobic tail. The suitable ligand which can bind to this membrane spanning region needs to have a hydrophobic part. Drugs having the necessary hydrophobic zone were screened.Work of Duangjit et al. showed that the hydrophobicity of long carbon chain is greater than short carbon chains and the threshold length of it could be fixed to 16 [46]. This led us to filter those drug molecules (with hydrophobic zone) having long aliphatic carbon chain of length ≥ 16.Further we checked their existing functions, FDA approval status and reported side effects. Based on all these, we filtered out 6 drug molecules shown in Table 3. These are Docosanol, Vitamin A Palmitate, Ascorbyl Palmitate, Cetalkonium, Octacosanol and Palmidrol. Among them Docosanol have been rejected as it is a topical medication. Ascorbyl Palmitate, Cetalkonium and Palmidrol have also been screened through AI based screening. Octacosanol and Vitamin A Palmitate are the exclusive set of drugs which got screened through this pattern recognition based criterion.
Aromatic carbon ring based pattern recognition
As TMD of E-protein containing 3 aromatic Phe residues (at position 20, 23 and 26); so the drugs possessing aromatic carbon ring has got the higher probability to bind with Phe amino acid. Based on this property we have selected list of drugs which has 2 or more aromatic carbon rings. Further, imposing the criterion of functionality, FDA approval status and reported side effects lead to the selection of only 1 drug Nafcillin by this procedure.Table 3 summarizes the set of ligands which has made up to the final stage for molecular docking.
Molecular docking based virtual screening results
From the molecular docking based virtual screening we have found that from the entire list provided in Tables 3, 9 ligands could bind with the E-protein in three distinct clusters (Fig. 4
). Cluster 1 consists of 7 ligand molecules; cluster 2 consists of 1 ligand molecule and cluster 3 consists of 1 ligand molecules.
Fig. 4
Binding of the ligand molecules with E-protein.
Binding of the ligand molecules with E-protein.Cluster 1 contains most of the ligands. Both cluster 1 and cluster2 were found to cover the region of protein-protein interaction site (indicated in blue). In other words, the aforementioned ligands in the cluster 1 and 2 could bind to the amino acid residues necessary for protein-protein interactions to make the active pentameric structure of the E-protein. The ligands present in the Cluster 3 were found to cover the outer surface of the E-protein in such a way that the flexibility of the protein necessary for interacting with other partners would somehow be hampered. The aromatic ring like structures present in the ligands would help them to stack properly with the Phe residues present in the oligomerization domain (amino acid residues 15 to 37) of E-protein. This would further enable the ligands to fix the aforementioned part of the protein in a compact static state in such a way that the protein-protein interactions necessary for oligomerization would no longer take place. Table 4
depicts the cluster-wise list of the ligand molecules along with their docking score. Table 5
denotes the detailed function and mode of action of these 9 drug molecules.
Table 4
Cluster-wise ligand list with their docking score.
Cluster-1
Cluster-2
Cluster-3
Molecule
Docking score (kcal/mol)
Molecule
Docking score (kcal/mol)
Molecule
Docking score (kcal/mol)
Nafcillin
−6.1
Ascorbyl palmitate
−4.2
Guaifenesin
−3.8
Nabumetone
−5.1
Cinametic acid
−3.9
Octacosanol
−3.8
Dodecanoic acid
−3.7
Palmidrol
−3.5
Salmeterol
−3.4
Table 5
Mode of action of the selected drugs to be repurposed for COVID-19 and their application in existing diseases.
Lowers total cholesterol and break down LDL (low density lipoproteins) or “bad” cholesterol.
High cholesterol, atherosclerosis
Cinametic acid
Nutraceutical
ACE(angiotensin converting enzyme) inhibitor.
Lauric acid
Nutraceutical
A medium chain fatty acid which can disrupt viral envelope. Also increases HDL (High density lipoprotein) or “good” cholesterol.
Influenza
Ascorbyl Palmitrate
Nutraceutical
Anti-oxidant. Protects the body against oxidative damage by free radicals.
Vitamin C deficiency
Palmidrol
Nutraceutical
Cannabinoid receptor agonist. Lowers the release of inflammatory cytokines and histamine.
Influenza
Salmeterol
Pharmaceutical
Beta2 adrenergic receptor agonist. Helps in the dilation of bronchioles and prevent air passage narrowing induced by mast cell released histamine.
Asthma and COPD (Chronic obstructive pulmonary disease)
Guaifenesin
Pharmaceutical
Increase the volume of trachea and bronchi. Aids in the clearance of mucous.
Cough and common cold
Cluster-wise ligand list with their docking score.Mode of action of the selected drugs to be repurposed for COVID-19 and their application in existing diseases.We then tried to identify the residue specificity of the ligands in order to identify the amino acid residues of the E-protein binding most strongly with the ligand atoms. The docking scores of all the 9 ligands are presented in Table 4. The binding interactions of all the ligands are presented in Supplementary File SF2.The sets of ligands which are found to be important one in the context of Envelope protein are Nafcillin, Nabumetone, Octacosanol, Cinametic Acid, Lauric Acid, Salmeterol, Palmidrol, Ascorbyl Palmitate and Guaifenesin. Their molecular structures generated by RDKit tool are provided in the Supplementary Fig. 1.
The following are the supplementary data related to this article.Supplementary Fig. 1
Molecular structure of the final set of drugs sent for docking.
Looking into the mechanism of action of the selected drugs and nutraceuticals based on the varied symptoms of COVID infected patients
The known mechanism of action of the screened ligands are discussed below in the perspective of its actions against the main threatening outcome in COVID-19patients i.e. Acute Respiratory Distress Syndrome (ARDS) as well as other (mainly) age related risk factors like hypertension, cholesterol etc. which has made aged people more vulnerable towards COVID-19 attack. Their detailed mode of action is depicted in Fig. 5A.
Fig. 5
A. Existing mechanism of action of the repurposed FDA approved pharmaceuticals and nutraceuticals within human Abbreviations: ABCA1- ATP-binding cassette transporter 1 AC- Acetyl Cyclase Ang- Angiotensin AT1R- Angiotensin1 Receptor ATP- Adenosine Tri Phosphate CAP- Community acquired Pneumonia ARDS- Acute Respiratory Distress Syndrome CETP- Cholesteryl ester transfer protein COX- Cyclooxygenase DAG –Diacylglycerol HDL- high Density Lipoprotein IL- Interleukin IP3- Inositol Tri Phosphate LCAT- Lecithin–cholesterol acyltransferase LDL- Low Density Lipoprotein MLCK- Myosin Light Chain Kinase PG- Prostaglandin PIP2-Phosphatidylinositol4,5-bisphosphate PLC- Phospholipase C PK- Protein Kinase PPAR - Peroxisome proliferator-activated receptor ROS- Reactive Oxygen Species TNF- Tumor necrosis Factor.
B. Predicted mode of action of the repurposed FDA approved pharmaceuticals and nutraceuticals against E-protein to combat SARS-CoV-2 mediated disease symptoms.
A. Existing mechanism of action of the repurposed FDA approved pharmaceuticals and nutraceuticals within human Abbreviations: ABCA1- ATP-binding cassette transporter 1 AC- Acetyl Cyclase Ang- Angiotensin AT1R- Angiotensin1 Receptor ATP- Adenosine Tri Phosphate CAP- Community acquired Pneumonia ARDS- Acute Respiratory Distress Syndrome CETP- Cholesteryl ester transfer protein COX- Cyclooxygenase DAG –Diacylglycerol HDL- high Density Lipoprotein IL- Interleukin IP3- Inositol Tri Phosphate LCAT- Lecithin–cholesterol acyltransferase LDL- Low Density Lipoprotein MLCK- Myosin Light Chain Kinase PG- Prostaglandin PIP2-Phosphatidylinositol4,5-bisphosphate PLC- Phospholipase C PK- Protein Kinase PPAR - Peroxisome proliferator-activated receptor ROS- Reactive Oxygen Species TNF- Tumor necrosis Factor.B. Predicted mode of action of the repurposed FDA approved pharmaceuticals and nutraceuticals against E-protein to combat SARS-CoV-2 mediated disease symptoms.Nafcillin is a semi synthetic, narrow spectrum antibiotic which is a beta lactamase resistant penicillin [55]. The bactericidal action of penicillin is to inhibit cell wall synthesis due to the presence of beta lactam ring. [56]. However, certain bacteria develop resistance against the beta lactam ring by synthesizing beta lactam inhibitors i.e. beta-lactamase or penicillinase [57]. To combat this resistance, penicillinase resistance drugs are introduced in the market [58]. Currently, Nafcillin is being used in the treatment of penicillinase producing staphylococcal species particularly methicillin sensitive Staphylococcus aureus (MSSA) [59]. Nafcillin is also being used to treat non specific lower respiratory tract infection as well as community-acquired pneumonia (CAP) [60]. Nafcillin is not known to cause life threatening adverse side effects and in our analysis, it shows the highest binding affinity with the TMD domain of monomeric E-protein. Thus, Nafcillin can be taken into consideration for redirecting its purpose for the treatment of SARS-CoV-2 infection as it could also combat bacterial co-infection in a COVID patient which produce the same symptoms as seen in SARS-CoV-2 infection (Fig. 5A).Nabumetone is an FDA approved non selective anti inflammatory drug (NSAID) which is being used for its anti inflammatory and anti pyretic effects [61]. It is a prodrug which goes though biotransformation within the liver to produce the active component, 6-methoxy-2-naphthylacetic acid (6MNA) [62], that inhibits the synthesis of prostaglandins by acting on Cyclooxygenase (COX) I and II [63]. Prostaglandins are responsible for initiating fever by signalling the hypothalamus to increase body temperature [64]. Prostaglandin also acts as inflammatory mediator acting on blood vessels to promote inflammatory response [65,66]. NSAIDs mediate anti-inflammatory effect by preventing vasodilation, reducing capillary permeability and cytokine release from endothelial cells [67]. Altogether, these effects impede the migration of immunocompetent cells the site of injury thereby preventing uncontrolled immune system activation and inflammation [68] (Fig. 5A).During the initial phase of COVID-19, we see the activation of uncontrolled immune response where leukocytes are recruited all over the body [69]. This is followed by an immunosuppression state with death of T and B cells further exacerbating the situation [70]. Permeability of the blood capillaries, lining the lung alveolar wall increases, causing fluids to leak inside, resulting in pulmonary edema and ARDS [71]. Literature evidences suggest that such situation is triggered by the viral E-protein [32]. Hence, anti-inflammatory agent Nabumetone, by targeting on the monomeric E-protein can hamper its formation of functional pentameric form which in turn could be beneficial in lowering the state of uncontrolled immune response. Although reported to cause less nephrotoxicity and Gastro intestinal (GI) toxicity than other NSAIDs, it still produce considerable side effects and is prescribed to be used by caution particularly for people with heart or kidney condition, high blood pressure and complications arising from thrombosis [72,73].Octacosanol is the main component of plant extracted natural wax and is a low–molecular-weight primary aliphatic alcohol [74]. Its role is mainly investigated for the treatment of Parkinson's disease [75]. It is approved as Nutraceuticals by FDA and is marketed as the main component of Policosanol (PC), a generic term for natural mixture of primary alcohols isolated originally from sugarcane wax [76]. The first PC supplement produced by Dalmer Laboratories in La Habana, Cuba [77] has been approved as a medication for lowering total cholesterol (TC) and breaking down low density lipoprotein cholesterol (LDL-C) in more than 25 countries including India(Trade name: Zetanol) [78]. It is known to lower cholesterol levels by inducing phosphorylation of AMPK and reducing the activity of HMG-CoA reductase, which is one of the key component of cholesterol biosynthesis [79]. It is also said to increase efflux of tissue LDL by High Density Lipoprotein(HDL) via reverse cholesterol transport pathway and resulting in bigger HDL particles by inhibiting the activity cholesteryl ester transfer protein(CETP) [80,81]. Bigger HDL particles are postulated to exert more protective function in comparison to total HDL particles [82].COVID-19 is known to severely affect the elderly population. People with atherosclerosis and higher cholesterol levels are more vulnerable to severe symptoms [83]. Cholesterol is an essential component for cell membrane structural makeup [84]. Synthesized by the liver, cholesterol is transported via bloodstreams packed within lipoprotein particles to various organs including lungs [85,86]. LDL is the major blood cholesterol carrier [87]. When LDL levels are high, organs could no longer take up the excess LDLs. They are then oxidised and engulfed by macrophages, get deposited along the arterial wall resulting in plaque formation (atherosclerosis) which increases the risk of stroke and heart attacks [88]. Moreover excess cholesterol in the lung is transported out via ABCA1 efflux pump back to the liver for recycling [89]. This mechanism of effluxion is inhibited during chronic inflammation and reduced with age causing macrophage rich cholesterol to deposit around tissue periphery resulting in increase tissue cholesterol [90,91]. SARS-CoV-2 uses Spike(S) protein to enter within the host cell cytosol by binding to surface receptor angiotensin-converting enzyme II (ACE2) followed by cleavage of S protein with host proteases [92]. A recent work by Wang et al. pointed that increased tissue cholesterol deposits with age can provide more viral entry points by trafficking and concentrating ACE2 as well as host proteases within lipid raft [93]. Hence, Octacosanol can be utilized both as a tissue cholesterol lowering agent to reduce risk of viral entry beside its repurposed role to inhibit E-protein oligomerization.Cinametic acid is an FDA approved food additive, mainly obtained from oil of cinnamon and other plant sources [94]. Among the many therapeutic functions of cinnamic acid, one of its role has also been linked to inhibit angiotensin-converting enzyme (ACE) [95]. ACE converts Angiotensin (Ang) I to Ang II [96]. Ang II is responsible for constricting blood vessels and increase blood pressure or hypertension, one of the risk factors for COVID-19 via binding to Angiotensin1 Receptor(AT1R) and activating a cascade of signalling pathways [97,98](shown in Fig. 5A). Role of Cinametic acid in inhibiting ACE will hamper conversion of Ang I to Ang II which can reduce hypertension. Further, Ang II gets converted to Ang-(I-VII) by ACE2 in the absence of ACE [99]. Ang(I-VII) has a protective role in the system as it binds to Mas1R receptor and helps in attenuating inflammation and fibrosis related lung injury. Moreover, this engagement of ACE2 in AngII conversion could in turn decrease the number of free tissue ACE2 for binding of SARS-CoV-2 spike proteins proposed by South et al. [100].Lauric acid/Dodecanoic acid, major source of which is coconut, can also be found in other plants [101]. It is a medium chain saturated fatty acid [102]. It can also be found in cow, goat and human breast milk [103]. Inside the body, lauric acid gets converted to the biologically active compound Monolaurin which is known for exerting antimicrobial activity [104], has been recognized as safe (GRAS) by the U.S. Food and Drug Administration (FDA) and are available as capsules. Particularly medium chain fatty acids are known to exert antiviral action in high doses by disrupting viral envelope and making the virus susceptible to immune attack. In this respect also, Lauric acid is an indispensable choice for SARS-CoV-2 inhibition as breakdown of viral coating (Fig. 5A) would result in easy clearance of virus from the system without giving rise to medical [105] complications. Further its inhibiting action on the viral E-protein oligomerization increases its potential as a therapeutic agent against the viral attack.Ascorbyl palmitate is an FDA approved small molecule. Mainly, it is a fat soluble form of Vitamin C formed by the ester of ascorbic acid and palmitic acid [106]. Being an amphipathic molecule, it has an advantage of being more stable and easily enters into cell membranes [107]. Within human red blood cells, ascorbyl palmitate can exhibit protective property against oxidative damage by free radicals [108] (Fig. 5A). The pathophysiological processes involved in COVID-19 can be linked to oxidative stress which increases with age [109]. Particularly, deprivation of antioxidant mechanisms is pivotal for viral replication and proliferation [110]. Generation of high level of Reactive Oxygen species during viral infection can lead to oxidative damage of cells as a result of increased inflammation [111]. The role of E-protein in activating inflammasome pathway has already been discussed in the introduction. Hence, ascorbyl palmitate which targets E-protein can be a great boost to the immune system not only by reducing oxidative stress but also by enhancing phagocytosis, chemocytosis and interferon production.Palmidrol / palmitoylethanolamide(PEA) has been initially identified in egg yolk. PEA is a natural fatty acid amide synthesized locally in animal and human tissues from the most common fatty acid source palmitic acid [112]. PEA is known to act locally and is produced when needed thus maintaining tight balance between production and breakdown [113]. PEA is known to be involved in the endocannabinoid system due to its affinity for the novel orphan cannabinoid receptors GPR55 and GPR119 [114,115]. In addition to its anti cancer, anti epileptic and neuroprotective properties, PEA is widely known to relieve inflammation associated with respiratory infections such as influenza [116]. Influenza is characterized by the production of pro inflammatory cytokines such as IL-6, IL-10 and TNF(Tumor Necrosis Factor)-α [117,118]. PEA is known to lessen the production of inflammatory cytokines by acting as the agonist of PPAR-α receptor and reduces the expression of NF-κβ [119] (Fig. 5A). It is also responsible for shifting the activated mast cells to the resting state by preventing mast cell migration and release of mast cell mediators like histamine through degranulation [120]. Thus, PEA can combat inflammation through various modes of action and has been found advantageous for treating respiratory viral infections for its minimal side effects and promising efficacy. Currently, PEA is available as a FDA approved nutraceutical under trade name PeaPure and is marketed under the trade name Nomast in Spain and Italy for special medical purposes [121]. These properties along with its capability to inhibit E protein oligomerization makes it a potential nutraceutical to fight against COVID 19.Salmeterol is a long-acting beta2-adrenergic agonist approved by FDA for treatment of asthma and COPD or high altitude pulmonary edema by stimulating beta2-adrenoceptors predominant in the bronchial smooth muscles [122]. beta2 receptor coupled to the Gs protein, when activated, catalyse the formation of cyclic adenosine monophosphate (cAMP) and activate protein kinase A(PKA), thus inactivating myosin light-chain kinase [123] (Fig. 5A). This results in dilation of bronchioles and proper passage of air through lung airways which gets obstructed and narrowed due to the accumulation of pus, swelling of alveolar wall linings muscle spasms [124]. These would otherwise lead to shortness of breath, difficulty in breathing, cough and congestion [125]. Such manifestations are also observed among COVID-19patients which is caused by E-protein and thus Salmeterol which can target E-protein have a high potential to be used for anti viral therapy for COVID-19. Salmeterol can also prevent air passage narrowing induced by mast cell released histamine [126]. However, beta2-adrenergic receptors are also found to be present in heart to some extent and thus salmeterol is advised to be administered with caution to patients which could otherwise lead up to undesirable cardiac effects [127].Guaifenesin is an FDA approved over the counter(OTC) or non prescription expectorant for treatment of cough and common cold [128]. It aids in the clearance of mucous and other respiratory tract secretion by increasing the volume of trachea and bronchi and reducing mucus viscosity which would otherwise lead to congestion, chronic bronchitis and COPD which are commonly seen in ARDS. Hereby, the action of Guaifenesin result in more productive cough [129], thus combating the condition of ARDS. This is also expected to happen if administered to COVID-19patients as it has the potential to disrupt the formation of the pentameric structure of E-protein which causes ARDS. It is a relatively safe drug to use though not recommended for children below the age of six [130].The overall discussion above regarding the known mechanism of action of the ligands (shown in Fig. 5A) show that most of them are able to combat the various symptoms of Acute Respiratory Distress Syndrome (ARDS), the signature indicator in COVID-19patients. Other drugs can alleviate major COVID-19 risk factors such as hypertension, age induced tissue cholesterol etc.On the other hand, we have shown that these ligands can target the monomeric structure of viral E-protein, thus inhibiting the formation of pentameric form of E-protein. As mentioned in the introduction, functional E-protein triggers an over-expression of inflammatory cytokines like IL-1β, resulting in edema and other characteristic symptoms of ARDS [32]. Hence, these ligands being capable of inhibiting the formation of functional E-protein has the potential of subduing ARDS related complications (Fig. 5B). Thus, their repurposed role can aid in overall patient recovery by reverting SARS-CoV-2 induced conditions of ARDS.
Conclusion
The viroporin channel activity of SARS-CoV E-protein in its pentameric form plays a significant role in viral pathogenesis and is producing the characteristics symptoms associated with ARDS. After synthesis of monomeric E-protein following replication, they make their way to the ERGIC compartment, where most of them get inserted into the membranes in its oligomerized form and carry out their pathogenic activity by interacting with several host proteins. The TMD domain of E-protein is responsible for this oligomerization and mutating the hydrophobic residue valine 25 (V25) to phenylalanine (V25F) can abolish the oligomeric structure completely whereas mutation of asparagine 15 (N15) to alanine (N15A) can reduce the chance of pentamer formation to some extent. It has also been reported that the loss of viroporin activity could result in significant relief from edema and inflammation. To cope up with such changes to oligomerize again, the virus introduces certain mutations from position 15 to 37 of the TMD region. Hence, the importance of this region in maintaining the oligomeric form and carrying out pathogenesis seems to be immense. SARS-CoV-2 E-protein sequence shares 94.74% similarity with that of E-protein SARS-CoV and the amino acid sequences in the region 15 to 37 were found to be totally conserved among SARS-CoV and SARS-CoV-2. In this study, we aim to target the monomeric E-protein so as to prevent its oligomerization and subsequent pathogenic activities. The loss of oligomerization would result in formation of defective viral particles.We have obtained drug information from various available sources and have utilized AI based deep learning for the first phase screening of drugs based on their binding affinity for E-protein. The small size of the E-protein, along with the membrane spanning nature and amino acid properties of the concerned region have also been considered during the process of drug screening. Thus, pattern based searching was also employed to screen suitable candidates in the first phase of screening. Next, our focus was to concentrate only on FDA approved commercial drugs which are already available easily along with FDA approved Nutraceuticals which have immense health benefits as dietary supplements. Our concern was also to check the severity of the adverse side effects of these drugs. Hence, second phase of screening was done based on these criteria and selected drugs were utilized for blind docking with the SARS-CoV-2 E-protein to obtain unbiased results in the third phase. Although there exists lack of scoring function which justifies the binding affinity between the ligand and the target protein universally [131], we have considered orientation of binding, site of binding and type of binding energy along with docking score to judge the binding affinity between them. Hereby, we have come up finally with 9 ligand molecules which has been found to be grouped into three distinct clusters, where ligands in cluster 1 and 2 interacts with the concerned amino acid residues in the zone of interest and those in cluster 3 covers the outer surface of the E-protein in a way that could hinder the flexibility of the protein necessary for interacting with other partners. Finally, we have checked the mechanism of action of these drugs and nutraceuticals emphasizing the mode in which they could combat the diverse complications associated with COVID-19 such as cytokine induced inflammation, COPD and ARDS, ACE induced hypertension, tissue cholesterol induced complications among others. Recommendation of the drugs for each patient based on their medical history as well as the symptoms experienced by them, would lead to better result and could avoid unwanted clinical complications. Overall, validation of the action of these repurposed pharmaceuticals and nutraceuticals in clinical settings will strengthen our conclusion and thus could portray the COVID-19 therapeutic roadmap which will serve as a ready remedy to stop this rapidly evolving pandemic.Molecular structure of the final set of drugs sent for docking.
Supplementary File SF1
Multiple sequence alignment between Envelope(E), Matrix(M), Nucleocaspid(N) and Spike(S) protein of SARS-CoV and SARS-CoV-2.
Supplementary FileSF2
Interaction of the final set of 9 ligand molecules with SARS-CoV-2 E-protein.
Authors: Marin L Schweizer; Jon P Furuno; Anthony D Harris; J Kristie Johnson; Michelle D Shardell; Jessina C McGregor; Kerri A Thom; Sara E Cosgrove; George Sakoulas; Eli N Perencevich Journal: BMC Infect Dis Date: 2011-10-19 Impact factor: 3.090