Literature DB >> 31762960

Current computational methods for predicting protein interactions of natural products.

Aurélien F A Moumbock¹, Jianyu Li¹, Pankaj Mishra¹, Mingjie Gao¹, Stefan Günther¹.

Abstract

Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.

Entities: Chemical Disease Gene Species

Keywords: Drug discovery; Drug-target interactions; Natural products; Pharmacological space; Target fishing; Virtual screening

Year: 2019 PMID： 31762960 PMCID： PMC6861622 DOI： 10.1016/j.csbj.2019.08.008

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Since the earliest times, for the treatment of diseases, humans have heavily depended on medicinal plants whose “active principles” are secondary metabolites termed natural products (NPs). Precisely, NPs are “genetically encoded small molecules” originating from microorganisms, plants, or animals [1], [2]. They have better coverage of the biologically relevant chemical space (pharmacological space) than synthetic molecules. It is estimated that about 60% of all medicines approved in the last three decades are either NPs or their semisynthetic derivatives [3], [4], [5]. Notable examples of approved drugs of NP origin (Fig. 1) include: the antibiotic penicillin G, isolated from the fungus Penicillium chrysogenum; the antibiotic streptomycin, isolated from the bacterium Streptomyces griseus; the anthelmintics avermectins (B1a and B1b), isolated from the bacterium Streptomyces avermitilis, and the antimalarial artemisinin, isolated from the plant Artemisia annua. Their discoverers received the Nobel Prize (in Physiology or Medicine) in 1945, 1952, and 2015, respectively [6]. There is a huge number of secondary metabolites annotated in focused chemical libraries such as StreptomeDB 2.0 [7] and NANPDB [8], which have not yet been investigated for their medicinal potential. Furthermore, for the vast majority of NPs whose activities have been evaluated in bioassays, their interaction profiles with drug targets (mostly proteins) are still unknown.

Fig. 1

Structures of some notable approved drugs of NP origin.

Structures of some notable approved drugs of NP origin. The “magic bullet” concept formulated in 1900 by Paul Ehrlich, is the foundation of single-target pharmacology. It states that a compound will exhibit a given biological activity unless it binds to a specific target [9], [10]. This principle has been successfully applied during the last century in the design of numerous approved drugs. However, the development of specific binders is a challenging task and many drugs have been withdrawn from the market due to their undesirable side effects, resulting from their target promiscuity. In recent years, there has been a quantum leap from single-target pharmacology to multi-target pharmacology (polypharmacology). With increasing knowledge about drug—target interactions (DTIs), more effective drugs can be developed by specifically modulating multiple targets simultaneously [11], [12]. Polypharmacology can therefore be an asset in synergistic therapy. Generally, NPs have high structural diversity and complexity, and very often exhibit target promiscuity. Bearing in mind that high throughput in vitro/vivo experiments for studying the polypharmacology of NPs are cost- and time-consuming, highly efficient prospective in silico predictions could serve as promising, rapid, and cost-effective strategies to decipher NP—target associations, prior to experimental validation [13], [14]. The prediction of ligand—receptor interactions, most commonly known as DTIs, is carried out in several stages of the drug discovery and development process, for on-target as well as off-target interactions. DTI prediction, and thereby prediction of the mechanism of action, can either be performed in a forward manner for virtual screening to predict putative ligands of a given druggable target, or in a reverse manner for target fishing to predict putative target proteins of bioactive ligand(s) [15], [16], [17]. In this review, we focus on the three current approaches dealing with computational DTI prediction, namely ligand-based, target-based, and target—ligand-based (hybrid) approaches (Fig. 2).

Fig. 2

Overview of computational approaches for DTI prediction; L and T represent ligand (including NPs and synthetic drugs) and target, respectively.

Computational methods for DTI prediction

Ligand-based approaches

These methods stem from the chemical similarity principle, which states that similar molecules typically have similar physicochemical properties and bind to similar drug targets [18]. Based on this principle, ligand-based similarity approaches predict DTIs via comparison of query ligands to known active ligands of a specific drug target. They are the methods of choice for drug targets whose macromolecular structures have not yet been solved, such as several G-protein-coupled receptors (GPCRs), transporters, or ion channels [18], [19]. Ligand-based similarity comparisons can be subdivided into pharmacophore modeling, chemical similarity searching, and quantitative structure—activity relationship (QSAR).

Pharmacophore screening

Historically, the concept of pharmacophore was formulated by Paul Ehrlich in 1909 [20], [21]. According to IUPAC, a pharmacophore is defined as “an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response” [22]. These pharmacophoric features include mainly aromatic, hydrophobic, charged ionizable and hydrogen bonding moieties. Pharmacophore perception involves the overlap of energy minimized conformations of a set of known active ligands and the extraction of the recurrent pharmacophoric features in a single model. Once a pharmacophore model has been generated, a query can be done using database molecules in a forward manner in search of novel putative hits, or in a reverse manner when a ligand is compared with multiple pharmacophore models in search of putative targets (parallel screening) [23]. Generally, the pharmacophore query is done by the overlay of generated 3D conformers and tautomers of each database molecule onto the pharmacophore model derived from bioactive ligands to identify the maximal common subsets [24], [25]. Alternatively, a bit-wise comparison of generated fingerprints of the pharmacophore model and those of the database molecules is made. Pharmacophoric fingerprints are bit strings encoding distances between sets of three (or four) pharmacophoric points in a ligand structure, counted in bonds and distance-binning at the 2D and 3D levels, respectively [25], [26]. The fit between a given query ligand and pharmacophore model can be measured either by rmsd-based or overlay-based scoring functions. The former scoring functions are superior in predicting the highest number of hits for large chemical libraries, whereas the latter have the advantage of producing the highest ratio of correct/incorrect hits [27], [28]. Some of the most popular programs used for pharmacophore modeling/search are Pharmer [29], Discovery Studio [30], LigandScout [31], Phase [32], Screen [33], and MOE [34]. Pharmacophore web servers include ZINCPharmer [35], PharmMapper [36], Pharmit [37], and CavityPlus [38]. Kirchweger et al. [39], used the pharmacophore program LigandScout [31] to generate two ligand-based pharmacophore models from known activators of the G protein-coupled bile acid receptor 1 (GPBAR1). These models were used to screen an NP library, leading to the identification of two NPs, farnesiferol B and microlobidene, which were confirmed to activate GPBAR1 with potencies similar to that of the endogenous ligand, lithocholic acid (Fig. 3).

Fig. 3

Representation of one of the generated pharmacophore hypotheses, aligned to lithocholic acid in 3D with exclusion volume spheres (A), without exclusion volumes (B), and in 2D (C) [39]. The original figure was published under a Creative Commons License. Due to advances in techniques for macromolecular structure determination, the paradigm has moved from ligand-centric to receptor-centric pharmacophore modeling. Briefly, 3D pharmacophoric features here are established on the ligand within the binding pocket of its co-crystallised protein [40], [41], [42]. During a receptor-centric pharmacophoric query, excluded volume spheres, corresponding to spatial positions occupied by the protein side chains, are usually added as constraints. This is done in order to ensure shape complementarity of the matches, meanwhile occasioning unfavorable steric clashes for bogus hits. Three databases exist which contain pharmacophore models extracted from PDB protein—ligand complexes, namely PharmaDB [42], PharmTargetDB [36], and Inte:PharmacophoreDB [43]. These databases are often used for target fishing of NPs, by implementation in a pharmacophore software. Rollinger et al. [44] used the latter database, along with the software Discovery Studio [30], to identify putative targets for 16 NPs isolated from the medicinal plant Ruta graveolens. These NPs exhibited in vitro micromolar inhibitory concentrations (IC50) to acetylcholinesterase, the human rhinovirus coat protein and the cannabinoid receptor type-2, identified from target fishing.

Chemical similarity searching

In the late 1980s, chemical similarity screening (also called nearest-neighbor searching or shape screening) was reported as an alternative to pharmacophore modeling [45], [46]. It involves the use of a similarity metric to assess the global intermolecular structural similarity between a query structure and each compound in a database, with the most-similar structures (nearest-neighbors) emerging as the top-ranked by the metric. The query (reference) structure can either be a whole molecule or a substructure (e.g. a “privileged scaffold”). In this approach, the molecules are structurally represented by 2D/3D molecular descriptors, principally fingerprints which can be either circular-, topological-, or substructure keys-based [26], [47], [48], [49]. A molecular fingerprint is an advanced form of the fundamental structural key. Unlike its precursor, the molecular fingerprint does not use predefined sets of structural patterns, and consequently has in general a higher information content and is less computationally expensive. However, similarity indices are highly dependent on the subjected chemical properties (such as the size of the molecule) or the relevance of specific chemical features (such as charged groups). To circumvent this drawback, the combination of different similarity indices was successfully applied (similarity fusion). An alternative strategy is the combination of several reference ligands as initial model for similarity screenings (group fusion) [19], [50], [51]. This method provides satisfactory predictions and is generally recommended for nearest-neighbor searching when numerous known active ligands are available [52]. For both approaches, it could be shown that they were at least as effective as the best individual similarity searches, and that the combination of fingerprints or multiple reference ligands could reduce substantial variations as compared to conventional approaches of similarity-based screening. Among the various existing similarity metrics, the Tanimoto coefficient (Tc) has been established as the gold standard [53],where a, b, and c are the number of bits: in the fingerprints of molecule A only, in the fingerprints of molecule B only, and common to the fingerprints of both molecules, respectively. Tc values range from 0 (complete dissimilarity) to 1 (identity). The higher the structural similarity between two molecules, the higher the probability that they might have similar activities for a given target [54], [55]. By virtue of its simplicity and speed, nearest-neighbor searching is incorporated in almost every drug design software package, as well as in online chemical databases. Different methods for encoding fingerprints, such as ECFP (circular-based), FP2 (topological-based), and MACCS (substructure-based), are in use. Several web servers for ligand-based target fishing exist, such as SwissSimilarity [56], SuperPred [57], TargetHunter [58], HybridSim-VS [59], PASS [60], SEA search server [61], and USR-VS [62]. Xu et al. [63] identified muscarinic acetylcholine receptor 2, cannabinoid receptor 1, cannabinoid receptor 2, and dopamine receptor 2 with TargetHunter, as potential targets for salvinorin A, the major component of the Mexican plant Salvia divinorum and a potent hallucinogen. These targets were validated by means of both in vitro and in vivo assays. Zatelli et al. [64] employed the similarity ensemble approach (SEA) to rationalize the anti-inflammatory effect of miconidin acetate (major metabolite of the Brazilian plant Eugenia hiemalis), whereby it was compared to annotated similar molecule ensembles for a given target from the ChEMBL16 binding database. The inflammation related protein 5-lipoxygenase, was the most promising predicted target and its inhibition by miconidin acetate was validated in cell-based assays (Fig. 4).

Fig. 4

Target fishing of miconidin acetate with the SEA Search sever.

Quantitative structure—activity relationship (QSAR)

Since its origin in the 1962 seminal paper of Hansch et al. [65], quantitative structure—activity relationship (QSAR) has been one of the main computational methods applied in medicinal chemistry [66]. QSAR attempts to build mathematical models which quantitatively correlate structural properties of substances and their biological activities using statistical analysis such as multiple linear regression (MLR), partial least-squares (PLS), k-nearest neighbors (kNN), etc [67]. QSAR models can be used to optimize existing leads or to predict DTIs for new compounds. As previously mentioned, the fundamental idea underlying QSAR modeling is that compounds sharing structural similarity should also share similar biological activity [18]. Based on the descriptors representing properties of (or differences between) compounds, QSAR methods can be classified into classical QSAR (2D-QSAR), 3D-QSAR, and higher dimensionalities (4D-7D QSAR) [68], [69]. Classical QSAR correlates activity with 2D-structural patterns and physicochemical properties of drugs such as pKa, logP, molecular weight, and polarizability [70]. However, the specific DTI depends on a shape complementarity between the ligand and the ligand-binding pocket in the 3D arrangement. It is not surprising that classical QSAR, considering neither the conformation nor the chirality of drugs, suffers from limitations. As a natural extension of classical QSAR, 3D-QSAR emerged for correlating steric and electrostatic potential interaction energies with biological activities, with CoMFA (comparative molecular field analysis) as the first successful demonstration [71]. The contour maps from CoMFA show key features and deeper insight into the mechanism of DTIs, which make it a powerful 3D QSAR method applied successfully in many cases. CoMSIA integrates electrostatic, steric, hydrophobic, hydrogen bond donor and acceptor effects [72]. However, in CoMFA analysis a mutual alignment of all ‘bioactive’ conformations of compounds is needed, which constitutes one of the most time-consuming aspects of alignment-dependent 3D-QSAR [73]. Thus, alignment-independent 3D QSAR methods have been developed such as COMPASS [74], CoMMA [75], HQSAR [76], and GRIND [77]. An advanced software tool implementing GRIND is Pentacle from Molecular Discovery [78]. The Schrodinger software suite offers AutoQSAR for 3D-QSAR modeling [79]. In order to refine ligand-based 3D QSAR models, receptor-based 3D-QSAR emerged, including COMBINE [80] and AFMoC [81]. QSAR techniques consider the interaction of a group of compounds with only one single target. When trained on these compounds, a QSAR model mostly has limited ability to extrapolate into novel areas of chemical space (to identify new classes of ligands or new binding modes of similar compounds outside the training data). In order to build a statistically meaningful model, QSAR requires enough data on a specific target, which is rarely the case when predicting DTIs for a newly identified target [82]. However, it could be shown that QSAR methods can be successfully applied to identify natural products and related derivatives as inhibitors for various targets, such as monoamine oxidase (MAO). In this study, Helguera et al. [83] combined 0D, 1D and 2D molecular descriptors including pure topological descriptors, connectivity indices, walk and path counts, information indices, or 2D-autocorrelations. Linear discriminant analysis (LDA) for modeling, replacement method (RM) for feature selection and Y-randomization test to ensure model robustness, were applied for generating structurally diverse and statistically meaningful QSAR models (Fig. 5). The combinatorial QSAR approach allowed derivation of chemical features which are important for the hMAO-B selectivity.

Fig. 5

QSAR modeling workflow. Different sets of descriptors were generated with MOE, DRAGON, and MODESLAB software. LDA and RM are implemented in the STATISTICA software.

Target-based approaches

Molecular docking and the aforementioned receptor-centric pharmacophore modeling are the two existing computational approaches for target-based (structure-based) DTI prediction, and are generally used in conjunction. Central to these methods is the 3D structure of the target protein, determined experimentally by X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryo-electron microscopy (cryo-EM) [84], [85], [86]. Alternatively, comparative (homology) modeling can be used to predict an unknown protein structure, based on the solved 3D structure of a template protein sharing high sequence similarity with the protein of interest [87].

Molecular docking

Docking predicts the binding mode (pose) of a ligand towards a target protein’s binding site forming a stable (non-)covalent complex, by evaluating and ranking the predicted binding affinities of various poses. During the pose identification phase of a docking simulation, the flexibility of the ligand is accounted as part of the molecular recognition process, whereas that of the protein is normally neglected (rigid receptor docking) [84]. Three types of scoring functions have traditionally been used to measure the binding affinities of the docking poses, namely: force fields, empirical, and knowledge-based scoring functions. Their inability to correctly rank the binding poses, partially due to the unaccounted solvation effect and protein flexibility, impede on their predictive reliability [88], [89], [90], [91]. Consensus scoring, involving the combination of two or more scoring functions, has been shown to produce more reliable ranking of docking poses [92], [93]. Also, machine learning scoring functions based on protein—ligand interactions data available in chemical databases, have emerged as promising surrogates of the classical scoring functions [94], [95], [96]. Furthermore, the binding affinities of top-ranked docking poses can be more accurately predicted via end-point free energy calculations such as molecular mechanics Poisson-Boltzmann or generalized Born surface area (MM/PBSA and MM/GBSA), combined with molecular dynamics (MD) simulations [97], [98], [99]. It is worth mentioning that, while induced-fit docking considers both ligand and protein flexibility, its high computational cost greatly penalises the number of evaluated ligands and docking poses [100]. The on- and off-target effects of several clinically approved drugs have been successfully predicted with the help of docking programs such as Gold [101], Glide [102], FlexX [103], Autodock [104], and DOCK [105], or web servers such as TarFisDock [106], INVDOCK [107] and idTarget [108] among others. Recently, Yang et al. [109] performed docking studies with the program Glide [102] to elucidate the stereoselective complementarity of (20S)-ginsenoside Rh2 over its 20R-epimer (constituents of ginseng), to the platelet P2Y12 receptor, which could be explained by their simulated binding modes, displaying disparate hydrogen bonding interactions with key residues such Asp266, Tyr105 and Glu188. In a view to rationalise the anti-tumor activity of epigallocatechin-3-gallate (EGCG), the major component of green tea, Wang et al. [110] constructed a dataset of tumor-related proteins and performed a reverse docking using the program Autodock Vina [111]. The authors established that EGCG anti-tumor mechanism may implicate 33 proteins (4 of which were previously unreported) via 12 signaling transduction pathways (Fig. 6). The inhibition of the 4 unreported proteins by EGCG was confirmed by means of in vitro enzymatic activity assay.

Fig. 6

Workflow of EGCG anti-tumour mechanism prediction, starting from reverse docking [110]. The original figure was published under a Creative Commons License.

Target—ligand-based approaches

As an extension of QSAR (ligand-based), computational chemogenomic approaches and proteochemometric modeling (PCM) constitute the two computational approaches for target—ligand-based (hybrid) DTI prediction, which integrate both the chemical information of the compounds as well as the genomic space of target proteins in a single machine learning model. In chemogenomics, active compounds are applied as chemical probes to characterize the function of a specific protein. The modulation of the protein by the active compound induces a specific phenotype. If the phenotype can be related to a therapeutic mechanism, the protein comes into question as a drug target (reverse chemogenomics). If a molecule induces a specific phenotype but the target is not yet known, the main challenge lies in the development of methods for target identification (forward chemogenomics) [112].

Chemogenomic machine—learning approaches

With increasing knowledge about DTIs, machine learning (ML) methods are becoming increasingly popular and can extend and complement classical rule-based approaches such as network- and graph-based methods [113], [114]. These ML methods for prediction of drug targets are normally supervised or semi-supervised, which requires a set of input variables or feature vectors (such as chemical fingerprints or physicochemical properties) and protein descriptors (such as amino acid composition, dipeptide composition, sequence order, etc.). The supervised ML algorithms for DTI predictions are trained on datasets that include labeled data containing information about the type of interaction and thus guide the algorithm to learn which features are important for DTIs. Consequently, known DTIs are a valuable resource for the development of ML prediction methods. For example, the latest release of DrugBank includes DTIs of about 12,000 drug entries including 2500 approved small molecule drugs and nearly 6000 experimental drugs [115]. Databases such as ChEMBL [116], PubChem Bioassay [117], and BindingDB [118] provide information about thousands of experimentally validated drug—target data pairs. The majority of similarity-based ML are based on the guilt-by-association (GBA) principle, which states that similar proteins may be targeted by the same drug or vice-versa [119]. Although it cannot be generalized, genes with related functions often share common properties or physical interactions in gene networks [120]. Traditionally, the nearest profile method (NN) and the weighted profile method were widely utilized to predict new drugs or targets using chemical and interaction information about known compounds and targets [121], [122]. In recent years, several new and optimized similarity-based methods have been published. Rodrigues et al. developed a random forest regression based DTI prediction workflow named DEcRyPT (Drug–Target Relationship Predictor) and it was successfully used to identify β-lapachone as an allosteric modulator of 5-lipoxygenase [123]. Semi-supervised machine learning algorithms, on the other hand, are trained on a combination of labeled and unlabeled data. Xia et al. utilized a manifold regularization semi-supervised learning method for predicting the DTIs from heterogeneous biological data sources [124] Schneider and co-workers developed SPiDER (self-organizing map-based prediction of drug equivalence relationships) utilizing the concept of unsupervised self-organizing map (SOM) algorithm applied in combination with pharmacophore feature representations for macromolecular target prediction. This software tool has been utilized in de-orphaning several natural products [125], [126]. In a further development TIGER (Target Inference GEneratoR) was created, which utilizes a combination of multiple SOMs and was validated for the target prediction of numerous natural products [127], [128].

Proteochemometric modeling

In contrast to chemogenomic machine—learning methods, proteochemometric modeling (PCM) allows both inter- and extrapolation to (novel) compounds and (novel) targets and can fulfill the need in hit identification of orphan targets [129], [130], [131]. PCM modeling requires three essential elements: descriptors (including target descriptors, ligand descriptors and additional cross-term descriptors describing information on ligand-target interaction), bioactivity data as well as appropriate modeling techniques linking the descriptors to the activity data. Ligand descriptors used in PCM include binary descriptors, physicochemical descriptors, 2D topological descriptors, 2D circular fingerprints and alignment based 3D descriptors. Physicochemical numerical (real-valued) descriptors are better interpretable than binary descriptors [132]. 3D descriptors require alignments of compounds in their active conformation in 3D space, which is error prone and may introduce noise into the data [133]. As compared to ligands, protein targets are in general larger and need also other descriptor sets. A reduction to a selection of residues (e.g. the binding sites) depends on the availability of related crystal structures. Information derived from sequence can be used to calculate similarity between various entities, such as binding pockets, physicochemical properties, topological properties, or 3D electrostatic potentials [134], [135]. Protein descriptors can be also generated based on the availability of specific residues, substructures, or domains. It was shown that a related feature-based semi-binary protein descriptors could outperform sequential descriptors [136]. Cross-term descriptors derived from the multiplication of ligand and protein descriptors (MLPD) were used in early PCM modeling research [137], [138], [139], [140]. Although it can describe the two entities simultaneously, its significance is not easy to evaluate [141]. Later, cross-terms not generated by multiplication were developed. A new type of cross-term descriptors introduced in PCM is protein—lligand interaction fingerprint (PLIF), which has been shown that it can outperform the MLPD-based descriptors [142]. Machine learning and data processing techniques implemented in PCM include support vector machines (SVM), random forest (RF), gaussian processes (GP), principal component analysis (PCA) [143], [144]. Since PCM considers related targets in addition to multiple ligands, it is able to quantify the similarity between different binding sites, such as the subpockets of a given protein target. PCM can provide advantages in identification for novel allosteric inhibitors, which show advantages in treatment by not disrupting essential physiological process completely [145]. Similarly, considering the induced-fit interaction between drugs and targets, PCM allows distinction between different protein conformations and binding modes. When these related targets refer to similar targets from different species, PCM modeling is able to extrapolate bioactivity data between species and provide intra-species selectivity [146]. Burggraaff et al. [147] recently applied PCM in identification of inhibitors for sodium-dependent glucose co-transporter 1 (SGLT1), by implementation of ligand- and protein-based information into random forest models. The authors used an in-house collection of natural products and synthetic compounds. 30 out of 77 identified compounds were validated in vitro, showing submicromolar activities (Fig. 7).

Fig. 7

Application of PCM to identify inhibitors of SGLT1 [147]. The original figure was published under a Creative Commons License.

Summary and outlook

This review presents the current advances and challenges of the state-of-the-art approaches in tackling DTI prediction in small molecule drug discovery from a computational point of view, with a special focus on NPs, which have been and will continue to be an indispensable source of drugs. Although, the rate of approved new molecular entities (NMEs) of NP origin has recently dropped, there is still a largely untapped reservoir of hitherto NPs that could fill the gap. Computational DTI prediction speeds up as well as reduce the cost of the rather expensive drug discovery and development process. The various in silico approaches for DTI prediction have their specific field of applicability. The method of choice in each drug discovery campaign will depend on the type of target protein under consideration, the availability of the protein’s macromolecular structure, the number of known active ligands and the availability of annotated DTIs in databases. The main caveat of ligand-based pharmacophore screening and similarity searching is the decrease in their predictive reliability when there is a low number of (or zero) known active ligands for a target of interest. In addition, there exist activity cliffs: molecules with high structural similarity but dissimilar biological activities for the same target. Regarding target-based approaches, the absence of the 3D macromolecular structure of the target protein, the lack of good scoring functions and the high computational costs, are the main drawbacks. As for ligand—ltarget-based approaches which mostly rely on machine learning algorithms, the quality of the curated drug-target annotations stored in chemogenomic databases is a matter of great concern. Also, there is a risk of chance correlation or overfitting because of the large number of descriptors. The hierarchical combination of several DTI prediction approaches has shown to provide superior predictions as opposed to the use of a single approach. These computational methods are still to reveal their full potential, where the completion of the Human Genome Project (HGP), improvements in cryo-EM for protein macromolecular structure determination and dynamics, advances in scoring algorithms and computing power, could be potential game changers.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

126 in total

Review 1. Receptor dependent multidimensional QSAR for modeling drug--receptor interactions.

Authors: Jaroslaw Polanski
Journal: Curr Med Chem Date: 2009-09-01 Impact factor: 4.530

Review 2. Receptor-based pharmacophore and pharmacophore key descriptors for virtual screening and QSAR modeling.

Authors: Xialan Dong; Jerry O Ebalunode; Sheng-Yong Yang; Weifan Zheng
Journal: Curr Comput Aided Drug Des Date: 2011-09-01 Impact factor: 1.606

3. Design of Natural-Product-Inspired Multitarget Ligands by Machine Learning.

Authors: Francesca Grisoni; Daniel Merk; Lukas Friedrich; Gisbert Schneider
Journal: ChemMedChem Date: 2019-05-16 Impact factor: 3.466

Review 4. Induced fit docking, and the use of QM/MM methods in docking.

Authors: Mengang Xu; Markus A Lill
Journal: Drug Discov Today Technol Date: 2013-09

5. PharmMapper 2017 update: a web server for potential drug target identification with a comprehensive target pharmacophore database.

Authors: Xia Wang; Yihang Shen; Shiwei Wang; Shiliang Li; Weilin Zhang; Xiaofeng Liu; Luhua Lai; Jianfeng Pei; Honglin Li
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

6. In silico target fishing for rationalized ligand discovery exemplified on constituents of Ruta graveolens.

Authors: Judith M Rollinger; Daniela Schuster; Birgit Danzl; Stefan Schwaiger; Patrick Markt; Michaela Schmidtke; Jürg Gertsch; Stefan Raduner; Gerhard Wolber; Thierry Langer; Hermann Stuppner
Journal: Planta Med Date: 2008-12-18 Impact factor: 3.352

7. An analysis of FDA-approved drugs: natural products and their derivatives.

Authors: Eric Patridge; Peter Gareiss; Michael S Kinch; Denton Hoyer
Journal: Drug Discov Today Date: 2015-01-21 Impact factor: 8.369

8. Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling.

Authors: Eva Freyhult; Peteris Prusis; Maris Lapinsh; Jarl E S Wikberg; Vincent Moulton; Mats G Gustafsson
Journal: BMC Bioinformatics Date: 2005-03-10 Impact factor: 3.169

Review 9. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities.

Authors: Samuel Genheden; Ulf Ryde
Journal: Expert Opin Drug Discov Date: 2015-04-02 Impact factor: 6.098

Review 10. Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds.

Authors: Hongbin Huang; Guigui Zhang; Yuquan Zhou; Chenru Lin; Suling Chen; Yutong Lin; Shangkang Mai; Zunnan Huang
Journal: Front Chem Date: 2018-05-09 Impact factor: 5.221

14 in total

1. Text Mining Protocol to Retrieve Significant Drug-Gene Interactions from PubMed Abstracts.

Authors: Oviya Ramalakshmi Iyyappan; Sharanya Manoharan; Sadhanha Anand; Dheepa Anand; Manonmani Alvin Jose; Raja Ravi Shanker
Journal: Methods Mol Biol Date: 2022

2. Computer-Aided Drug Design of Natural Candidates for the Treatment of Non-Communicable Diseases.

Authors: Hilal Zaid; Siba Shanak; Akhilesh K Tamrakar
Journal: Evid Based Complement Alternat Med Date: 2022-06-26 Impact factor: 2.650

3. Scope of 3D Shape-Based Approaches in Predicting the Macromolecular Targets of Structurally Complex Small Molecules Including Natural Products and Macrocyclic Ligands.

Authors: Ya Chen; Neann Mathai; Johannes Kirchmair
Journal: J Chem Inf Model Date: 2020-05-05 Impact factor: 4.956

Review 4. Mechanisms of Action for Small Molecules Revealed by Structural Biology in Drug Discovery.

Authors: Qingxin Li; CongBao Kang
Journal: Int J Mol Sci Date: 2020-07-24 Impact factor: 5.923

Review 5. Chemogenomic Approaches for Revealing Drug Target Interactions in Drug Discovery.

Authors: Harshita Bhargava; Amita Sharma; Prashanth Suravajhala
Journal: Curr Genomics Date: 2021-12-30 Impact factor: 2.689

Review 6. Natural product drug discovery in the artificial intelligence era.

Authors: F I Saldívar-González; V D Aldas-Bulos; J L Medina-Franco; F Plisson
Journal: Chem Sci Date: 2021-12-13 Impact factor: 9.825

7. Antiproliferative and Carbonic Anhydrase II Inhibitory Potential of Chemical Constituents from Lycium shawii and Aloe vera: Evidence from In Silico Target Fishing and In Vitro Testing.

Authors: Najeeb Ur Rehman; Sobia Ahsan Halim; Majid Khan; Hidayat Hussain; Husain Yar Khan; Ajmal Khan; Ghulam Abbas; Kashif Rafiq; Ahmed Al-Harrasi
Journal: Pharmaceuticals (Basel) Date: 2020-05-13

8. Pharmacoinformatic Investigation of Medicinal Plants from East Africa.

Authors: Conrad V Simoben; Ammar Qaseem; Aurélien F A Moumbock; Kiran K Telukunta; Stefan Günther; Wolfgang Sippl; Fidele Ntie-Kang
Journal: Mol Inform Date: 2020-10-08 Impact factor: 3.353

9. StreptomeDB 3.0: an updated compendium of streptomycetes natural products.

Authors: Aurélien F A Moumbock; Mingjie Gao; Ammar Qaseem; Jianyu Li; Pascal A Kirchner; Bakoh Ndingkokhar; Boris D Bekono; Conrad V Simoben; Smith B Babiaka; Yvette I Malange; Florian Sauter; Paul Zierep; Fidele Ntie-Kang; Stefan Günther
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

10. Development and characterization of a DNA aptamer for MLL-AF9 expressing acute myeloid leukemia cells using whole cell-SELEX.

Authors: Kaylin G Earnest; Erin M McConnell; Eman M Hassan; Mark Wunderlich; Bahareh Hosseinpour; Bianca S Bono; Melissa J Chee; James C Mulloy; William G Willmore; Maria C DeRosa; Edward J Merino
Journal: Sci Rep Date: 2021-09-27 Impact factor: 4.379