Literature DB >> 30372617

Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome-Inhibitor Interaction Landscapes.

Antonius P A Janssen1, Sebastian H Grimm1, Ruud H M Wijdeven2, Eelke B Lenselink3, Jacques Neefjes2, Constant A A van Boeckel4, Gerard J P van Westen3, Mario van der Stelt1.   

Abstract

The interpretation of high-dimensional structure-activity data sets in drug discovery to predict ligand-protein interaction landscapes is a challenging task. Here we present Drug Discovery Maps (DDM), a machine learning model that maps the activity profile of compounds across an entire protein family, as illustrated here for the kinase family. DDM is based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm to generate a visualization of molecular and biological similarity. DDM maps chemical and target space and predicts the activities of novel kinase inhibitors across the kinome. The model was validated using independent data sets and in a prospective experimental setting, where DDM predicted new inhibitors for FMS-like tyrosine kinase 3 (FLT3), a therapeutic target for the treatment of acute myeloid leukemia. Compounds were resynthesized, yielding highly potent, cellularly active FLT3 inhibitors. Biochemical assays confirmed most of the predicted off-targets. DDM is further unique in that it is completely open-source and available as a ready-to-use executable to facilitate broad and easy adoption.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30372617      PMCID: PMC6437696          DOI: 10.1021/acs.jcim.8b00640

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


Introduction

Chemical space is vast and can only be explored to a small extent with experimental methods to find suitable hits for drug discovery programs.[1,2] The search for new chemical starting points to modulate therapeutic targets is essential for the development of novel drugs. It has been postulated that the best way to find a new drug is to start with an old drug.[3] This is in line with the central paradigm in medicinal chemistry that similar structures exert similar biological activities.[4] Protein kinases are an important class of drug targets because of their key role in intracellular signal transduction processes involved in cancer, autoimmune diseases, and (neuro)inflammation.[5,6] The therapeutic value of the protein kinase family is demonstrated by the 38 kinase inhibitors (KIs) currently approved by the FDA and a plethora of molecules being tested in clinical trials for this enzyme family.[7] It is anticipated that these clinically approved KIs may serve as starting points to identify novel drug candidates for other kinases. Most KIs interact with a structurally and functionally conserved ATP-binding site that is present in all 518 human protein kinases. It is well established that KIs bind multiple members of the kinase family and that this may affect their efficacy and toxicity.[8] Detailed investigation of the target interaction landscape of KIs is therefore important to understand their molecular mode of action and offers the opportunity to identify new starting points for other therapeutically interesting kinases. Many complex, high-dimensional data sets with structure–activity relationships (SARs) of KIs over a broad selection of kinases have become available (Table S1).[9−14] These empirical data sets may serve as guides to explore chemical space around this drug target family and predict (off-)target activity using advanced computational chemistry methods, such as quantitative SAR (QSAR) models, the similarity ensemble approach (SEA), support vector machines, k-nearest neighbor, random forest, naïve Bayes, (deep learning) neural networks (NNs), and principal component analysis (PCA).[15−18] Advanced machine learning models promise to revolutionize the field of drug discovery. Employing high-dimensional data sets, these models are used to predict a wider range of biological activities for a compound compared with traditional drug design methods (e.g., molecular modeling, docking, and early QSAR models such as Hansch and Free–Wilson analyses[19]). However, advanced machine learning models are hampered in their applicability by a lack of clear interpretation and a tendency to overfit high-dimensional data. Many of the best-performing machine learning models are black boxes in which it is unclear how the data are used to generate novel hypotheses. They also require in-depth knowledge of advanced cheminformatics and highly specialized or purpose-built software. These technical requirements slow the implementation of the tools in the daily practice of drug discovery and consequently prevent the research community from taking full advantage of the wealth of data becoming available. Therefore, there is a clear need for better tools to interpret and visualize complex, high-dimensional SAR data sets in an easy and intuitive manner and to predict the biological activity profiles of novel hits for drug discovery programs. Here we present Drug Discovery Maps (DDM), a machine learning tool that allows the visualization and prediction of target–ligand interaction landscapes.

Results

t-SNE Maps the Molecular Similarity of Experimental Drugs in Chemical Space

On the basis of the principle that the chemical structure of a compound determines its biological and chemical properties, a machine learning algorithm that predicts target–ligand interaction landscapes should be able to recognize molecular similarity between different molecules. Traditionally, chemical similarity is measured by the Tanimoto coefficient (Tc).[20] A molecular fingerprint, which is a high-dimensional bit vector that captures the presence or absence of chemical groups in a molecule, is used by the Tc to calculate the similarity between compounds. As a similarity metric the Tc has its limitations, predominantly because it averages differences over all bits, thereby losing information.[21] Thus, we envisioned that the data contained in the molecular fingerprint could be used more efficiently by a machine learning algorithm to determine molecular similarity. In recent years, the t-distributed stochastic neighbor embedding (t-SNE) algorithm has been shown to be a powerful tool to visualize complex high-dimensional data sets in diverse experimental settings.[22−26] This state-of-the-art unsupervised machine learning technique is especially powerful in preserving local data structures in high-dimensional data. It can be readily applied to bit strings of any length and as such is easily applicable to chemical structures represented by molecular fingerprints. We aimed to use t-SNE at the core of our prediction model, where the algorithm is used to find and cluster the most similar molecules in a large data set and visualize that similarity clustering in two-dimensional space. We decided to apply the t-SNE algorithm to visualize the molecular similarity of molecules from the Drug Repurposing Hub, an online repository containing compounds that have been clinically tested in humans.[27] We selected only the launched drugs (2274) and manually classified them into 27 chemotypes. Morgan fingerprints (RDKit, 4096 bits, radius = 2) were generated for each of these 2274 clinical compounds using KNIME, an open-source software package.[28,29] The fingerprints were fed into the Python implementation of the Barnes–Hut t-SNE algorithm to generate a map of the drug-like chemical space.[30] The resulting map (Figure A) shows remarkable colocalization of most of the chemotypes. As an example, the family of penicillin-like structures at the far right of the plot (cyan) is completely separated from all other chemical matter. Some unannotated molecules (in gray) are visible in the cluster, but upon detailed inspection they all constitute β-lactams in which the sulfur is either substituted or omitted. In addition, many other highly dense clusters are visible at the boundaries of the map, corresponding to highly defined chemotypes such as the rapamycin, conazole, and oxytocin analogues. It is noteworthy that even in the apparently less defined center of the map, clear colocalization of similar molecules can be observed, for example, a cluster of aspirin-like molecules (orange, near the origin). Thus, t-SNE is able to map the chemical space of approved drugs following a chemist’s intuition and recognizes molecular similarity in a broad set of diverse drug-like molecules.
Figure 1

t-SNE visualization of chemical space. (a) t-SNE embedding of the “launched” drugs in the Drug Repurposing Hub. Embedding is based on the 4096-bit Morgan fingerprint. t-SNE settings: perplexity = 25, learning rate = 50, iterations = 10 000. Markers are colored according to 27 manually attributed chemotypes. An animation of the process of embedding is included in the supporting video. (b) t-SNE embedding of the Published Kinase Inhibitor Set. Embedding is based on the 4096-bit Morgan fingerprint. t-SNE settings: perplexity = 50, learning rate = 50, iterations = 10 000. Markers are colored according to 31 manually attributed chemotypes.

t-SNE visualization of chemical space. (a) t-SNE embedding of the “launched” drugs in the Drug Repurposing Hub. Embedding is based on the 4096-bit Morgan fingerprint. t-SNE settings: perplexity = 25, learning rate = 50, iterations = 10 000. Markers are colored according to 27 manually attributed chemotypes. An animation of the process of embedding is included in the supporting video. (b) t-SNE embedding of the Published Kinase Inhibitor Set. Embedding is based on the 4096-bit Morgan fingerprint. t-SNE settings: perplexity = 50, learning rate = 50, iterations = 10 000. Markers are colored according to 31 manually attributed chemotypes. Next, we wanted to test whether t-SNE is still able to recognize molecular similarity within a smaller set of drug-like molecules that is more homogeneous and has higher molecular similarity. To this end, we performed t-SNE-mediated clustering of the molecules from the Published Kinase Inhibitor Set (PKIS).[31] The PKIS is a 364-member library of molecules assembled by GSK that are all classified as inhibitors of protein kinases. The PKIS represents 31 chemotypes, and their activities have been measured on 200 kinases.[13] The resulting map of chemical space representing the KIs (Figure B) again shows clear colocalization of specific chemotypes. A more in-depth analysis (see the Supporting Information and Figure S1) confirms the initial visual inspection and shows high statistical correlation between the autonomously derived clustering and the human annotation. Of the 31 chemotypes annotated, 23 were fully collected in one computationally assigned cluster. For example, the orange and gold clusters on the left of the map are completely isolated and comprise all of the compounds of those chemotypes (Figure S1). This illustrates how t-SNE is capable of recognizing and clustering molecular entities in a highly specific manner and allows the visual inspection of high-dimensional chemical structural data, or chemical space, in an easy and intuitive way.

t-SNE Map of the Target Space of Kinases Recapitulates Phylogenetic Information

On the basis of the observation that binding sites in closely related proteins bind similar endogenous molecules and (experimental) drugs, we wanted to determine whether the t-SNE algorithm is capable of clustering proteins on the basis of the chemical similarity of their amino acids in the binding pocket. Conceptually, this approach is analogous to proteochemometric modeling.[32] To this end, we chose the protein kinase family as the drug target class because this is a large family of over 500 members that all use ATP in their active site and often show cross-reactivity toward (experimental) drugs. To quantify the similarity of kinases, we aligned the amino acid sequences of the whole kinase domains containing the ATP-binding pocket and used a fingerprint based on physicochemical properties of the amino acids.[33] The fingerprints were used to create a two-dimensional map of the target space by the t-SNE algorithm. The resulting map (Figure A) is striking, as it almost seamlessly recreates the phylogenetic tree published by Manning et al. in 2002.[34] To assign the kinases to clusters, the coordinates of the t-SNE embedding were fed into the unsupervised clustering algorithm DBSCAN (see Supporting Information for details).[35] All 10 assigned clusters were significantly (P < 0.0001, hypergeometric test) enriched for a specific kinase group as assigned by Manning et al. (Figure A). Closer inspection of some of the kinases unassigned by DBSCAN reveals that they belong to distinct branches of the phylogenetic tree, corresponding to their separation from the main clusters. As an example, the four TK kinases at the far right of the embedding (burgundy) all belong to the JAK family (JAK1, -2, and -3 and Tyk2) but only represent their second kinase domain. The first kinase domain is more closely associated with the rest of the TK group and lies just outside the DBSCAN-assigned cluster. The close association of the second kinase domains with the RGC cluster (colored brown) is especially striking, as these domains, just like the RGC kinases, are considered to be pseudokinases. The same holds true for MLKL, IRAK2, and IRAK3. Intriguingly, the IRAK family of TKL kinases has four members, of which IRAK1 and IRAK4 are catalytically active whereas IRAK2 and IRAK3 are not.[36] In the t-SNE embedding, the former are located in the major TKL cluster (orange), whereas the latter are actually assigned to the RGC-dominated cluster. MLKL has also been shown to lack catalytic activity in at least one report.[37]
Figure 2

t-SNE visualization of kinase domains reveals phylogenetic information. (a) t-SNE embedding of physicochemical fingerprints of the kinase domains of 535 human kinase domains. t-SNE settings: perplexity = 50, learning rate = 50, iterations = 25 000. Arbitrary t-SNE coordinates are rotated to match the dendrogram orientation of Manning et al.[34] Markers are colored according to the 12 groups defined by Manning et al., and the background is colored on the basis of the DBSCAN-generated clustering, colored by the dominant kinase group in that cluster (blanks are unclustered kinases). (b) Manning et al. manually curated kinome dendrogram overlaid with circles colored according to the background coloring from the t-SNE map in (A) based on the unsupervised DBSCAN clustering.[39]

t-SNE visualization of kinase domains reveals phylogenetic information. (a) t-SNE embedding of physicochemical fingerprints of the kinase domains of 535 human kinase domains. t-SNE settings: perplexity = 50, learning rate = 50, iterations = 25 000. Arbitrary t-SNE coordinates are rotated to match the dendrogram orientation of Manning et al.[34] Markers are colored according to the 12 groups defined by Manning et al., and the background is colored on the basis of the DBSCAN-generated clustering, colored by the dominant kinase group in that cluster (blanks are unclustered kinases). (b) Manning et al. manually curated kinome dendrogram overlaid with circles colored according to the background coloring from the t-SNE map in (A) based on the unsupervised DBSCAN clustering.[39] Another interesting feature is the separation of a group (left of the plot) of TKL kinases from the major cluster. This subset features all but one of the STKR family of cell-surface-bound receptor kinases. Upon closer inspection, even the subfamilies of STRK1 and -2 are discernible. Strikingly, the MISR2 (AMHR2) kinase receptor is located with kinases categorized as “Other”. This receptor kinase has an atypical DFG motif (DLG) and as such can indeed be classified as a pseudokinase, although phosphorylation activity has experimentally been shown.[38] The other members of the STKR family do all share the conserved DFG motif. Finally, on the lower side of the t-SNE plot, several AGC-colored kinases have been clustered with the CAMK kinases. These actually represent the second kinase domains of the RSK family, which were also attributed to the CAMK group by Manning et al.[34] In summary, this analysis of target space of the binding site of protein kinase domains ensured us that this embedding is able to recognize overall similarity but also detect subtle differences between the different binding domains of most kinase inhibitors.

DDM Can Predict Target–Ligand Interaction Landscapes

On the basis of chemical and target space maps of kinases and their inhibitors, we envisioned that these could provide a workflow to predict the activity of novel compounds for the entire kinome. We dubbed this approach Drug Discovery Maps (DDM). The bioactivity data measured by Elkins et al.[13] for the PKIS were used as the training set, as the PKIS contains the most unique interactions of all open data sets (Table S1). The optimization of the workflow with all of the parameters is described in more detail in the Supporting Information. The final architecture of the algorithm is depicted in Figure and illustrated for the EGFR inhibitor erlotinib. At first, a t-SNE embedding is generated in which erlotinib is mapped onto the chemical space of the PKIS (top left). This information is used to find the nine most similar molecules (top right). Of these, the inhibition data measured by Elkins et al. are averaged, and all of the kinases above a threshold value C are considered targets (bottom right). A view the inhibition profiles for this process is included in Figure S5. These kinases are then looked up in the target space map (Figure ), and the most similar kinases are appended (bottom left) to yield the final prediction (center). As the molecular t-SNE embedding is slightly stochastic, the described process is repeated several times (R), and the number of times a kinase is predicted is tracked. Our DDM model was validated using an independent data set generated by Karaman et al.[9] The resulting prediction statistics for each of the 38 compounds in this test set are summarized in Table S2. The average positive prediction value (PPV) was 40% with a Matthews correlation coefficient (MCC) of 0.21. We compared these statistics with previously published methods and found that DDM was better than QSAR models and equal in performance to random-forest-based proteochemometric models (Figure S2). A receiver operating characteristic (ROC) analysis of the performance of DDM on this test set showed an area under the curve (AUC) of 0.76 (Figure S3). Taken all together, these result show that we have developed and validated a novel machine learning model to predict kinome inhibitor landscapes.
Figure 3

Schematic overview of the DDM workflow. In this example, the targets of erlotinib are predicted. On the basis of a t-SNE embedding (top left), the PKIS inhibitors nearest to erlotinib are found (top right). For these, the inhibition data as measured by Elkins et al.[13] are averaged and used as an initial prediction (bottom right). These targeted kinases are then looked up in the t-SNE embedding (bottom left), where the most similar kinases are added to yield the final prediction (center).

Schematic overview of the DDM workflow. In this example, the targets of erlotinib are predicted. On the basis of a t-SNE embedding (top left), the PKIS inhibitors nearest to erlotinib are found (top right). For these, the inhibition data as measured by Elkins et al.[13] are averaged and used as an initial prediction (bottom right). These targeted kinases are then looked up in the t-SNE embedding (bottom left), where the most similar kinases are added to yield the final prediction (center).

Discovery of Novel FLT3 Inhibitors Using DDM

To investigate the utility of the model in early drug development, it was applied for the identification of new inhibitors for FMS-like tyrosine kinase 3 (FLT3). FLT3 is implicated in advanced myeloid leukemia, where approximately 30% of patients carry an internal tandem duplication (ITD) in their FLT3 gene that activates the kinase and acts as a driver mutation.[40] Recently, midostaurin has been approved by the FDA for the treatment of acute myeloid leukemia (AML) patients, and several other inhibitors are currently being tested in clinical trials. However, fast adaptive mutations in the FLT3 gene quickly result in drug-induced resistance of the AML, warranting the search for novel chemotypes to inhibit this kinase. To this end, the DDM model was used to predict the kinome–ligand interaction landscape of a small kinase-focused library of 1152 molecules. They were analyzed using various values for the activity cutoff C and were ultimately filtered with C = 40% and a prediction count of at least nine out of 10 runs in order to have a balanced number of molecules to be tested. These stringent cutoffs yielded a set of 44 compounds predicted to be active at FLT3. To validate our virtual DDM screen, we performed a time-resolved fluorescence resonance energy transfer (FRET)-based biochemical assay with all 1152 compounds against FLT3 at an initial concentration of 10 μM. This screen yielded 184 actives with >50% loss of activity (16% of all compounds). Of these compounds, the pEC50 values were measured, resulting in 135 compounds with pEC50 > 5, with a mean of 6.7 ± 0.9. Eighteen of the 184 compounds were also identified by our DDM screen, which results in a PPV (or hit rate) of 41% (Figure A, P < 0.0001 (hypergeometric test)), which is almost 3-fold higher than the hit rate of the biochemical assay. Interestingly, 15 of the predicted compounds demonstrated EC50 values of <2 μM (34%, P < 0.0001 (hypergeometric test)) with an average pEC50 of 7.3 ± 1.1; this group included the most active compound found in the screen, crenolanib (pEC50 = 9.0). The hit rate was nearly identical to the validation statistics for the test set (Figure S2), where an overall PPV of 40% was achieved. The same holds for the negative predictive value (89%) and the sensitivity (11%). The successful application of our model for the FLT3 screen may partially be attributed to the high coverage for the TK family of kinases. It should be noted that the relatively low sensitivity (11%) is a balanced choice between minimizing the number of compounds to screen and finding more actual hits. This can easily be tuned by varying the cutoff parameter.
Figure 4

Discovery of novel FLT3 inhibitors using DDM. (a) Scatter plot of all compounds and their inhibitory effects at 10 μM as measured in the high-throughput screen. DDM-predicted molecules are marked red. (b) Structures and syntheses of the two compounds resynthesized and tested in situ against MV4:11 cells. Reagents and conditions: (i) cyanamide, nitric acid, ethanol, 78 °C, 76%; (ii) dimethylformamide diethyl acetal, toluene, 80 °C, 80%; (iii) K2CO3, ethanol, 78 °C, 31%; (iv) 4-aminophenol, NaOH, DMSO, 100 °C, 65%; (v) triphosgene, DCM, 40 °C; (vi) 1,4-dioxane, 110 °C, 44% over two steps. (c) Dose–response curves for compounds 1 and 2 against recombinant FLT3 in a FRET-based activity assay. Markers denote mean ± SD (N = 4). Dotted lines denote the 95% confidence intervals of the EC50 fits. (d) Dose–response curves of compounds 1 and 2 against MV4:11 leukemia cells. Markers denote mean ± SD (N = 3). Dotted lines denote the 95% confidence intervals of the EC50 fits. (e) Docking poses of 1 and 2 in the 3D models of FLT3 and the corresponding 2D interaction plots.

Discovery of novel FLT3 inhibitors using DDM. (a) Scatter plot of all compounds and their inhibitory effects at 10 μM as measured in the high-throughput screen. DDM-predicted molecules are marked red. (b) Structures and syntheses of the two compounds resynthesized and tested in situ against MV4:11 cells. Reagents and conditions: (i) cyanamide, nitric acid, ethanol, 78 °C, 76%; (ii) dimethylformamide diethyl acetal, toluene, 80 °C, 80%; (iii) K2CO3, ethanol, 78 °C, 31%; (iv) 4-aminophenol, NaOH, DMSO, 100 °C, 65%; (v) triphosgene, DCM, 40 °C; (vi) 1,4-dioxane, 110 °C, 44% over two steps. (c) Dose–response curves for compounds 1 and 2 against recombinant FLT3 in a FRET-based activity assay. Markers denote mean ± SD (N = 4). Dotted lines denote the 95% confidence intervals of the EC50 fits. (d) Dose–response curves of compounds 1 and 2 against MV4:11 leukemia cells. Markers denote mean ± SD (N = 3). Dotted lines denote the 95% confidence intervals of the EC50 fits. (e) Docking poses of 1 and 2 in the 3D models of FLT3 and the corresponding 2D interaction plots. Two of the predicted compounds, 1 and 2 (Figure B), were selected on the basis of their chemical properties, novelty regarding FLT3 inhibition, and predicted interaction profiles (vide infra). These compounds were resynthesized using established methods (see Figure B and the Supporting Information). The activity of the compounds was confirmed in a FRET assay using recombinant human FLT3 (Figure C). Compounds 1 and 2 showed a concentration-dependent activity with pEC50 values of 7.3 ± 0.1 and 8.8 ± 0.1, respectively. To determine the cellular activities of these two compounds, a cell proliferation assay using the FLT3-dependent AML cell line MV4:11 was performed. Both 1 and 2 showed clear cellular activity with pEC50 values of 6.3 ± 0.1 and 8.5 ± 0.1, respectively (Figure D). In summary, the experimental validation of the hits illustrates the power of our DDM workflow for compound selection in the lab. Finally, to explain the potential binding mode of compounds 1 and 2, these compounds were docked using a DFG-in model for 1 and a DFG-out structure (PDB entry 4RT7) for 2 (Figure E). Compound 1 binds to the hinge region with the aminopyrimidine moiety in a fashion typical for type 1 kinase inhibitors. Compound 2 binds in the DFG-out conformation much like RIPK2 (PDB entry 5AR7) by forming hydrogen bonds to the DFG motif using the urea functionality and to the hinge region using the pyridine nitrogen.[41]

Kinome Activity Spectrum Prediction Using DDM

To reduce potential toxic side effects, kinase cross-reactivity is ideally minimized. DDM enables rapid assessment of the predicted cross-reactivity because by default DDM predicts the interactions with the entire kinome. Thus far, however, only the FLT3 prediction has been taken into account. As final validation, we tested the activities of the two inhibitors on the predicted off-targets in biochemical assays. In addition to FLT3, compounds 1 and 2 were predicted to be active against 35 and 33 kinases, respectively (C = 40%, R > 0.5). The off-targets were validated using KinaseProfiler by Eurofins at 10 μM. The inhibition data per compound are shown in Table S3. For compound 1 the predictions were 69% accurate (24 of the 35 off-targets confirmed (<50% remaining activity) with two additional off-targets in the low 50% residual activity range). For compound 2 the prediction was exceedingly accurate, as 26 of the 33 targets (79%) were indeed inhibited >50%. To conclude, DDM was able to predict the kinome–inhibitor interaction landscape with a relatively high accuracy.

Discussion

Drug discovery is still largely an empirical process that is challenging, time-consuming and hard.[42] Multiparameter optimization of chemical structures, which is needed to balance the activity and selectivity of a drug candidate, requires the understanding of high-dimensional data sets. Machine learning algorithms have been employed to analyze and predict compound activity using large data sets with varying success.[15−17] Some of the major drawbacks of most computational models are the complexity of the algorithm and the “black box” nature of the systems. Implementation and interpretation of such systems is not trivial, and consequently, they have not been widely adopted by the drug discovery community. Here we present DDM, which is an intuitive, data-driven (bio)molecule similarity clustering procedure using state-of-the-art machine learning techniques. The model is based on the t-distributed stochastic neighbor embedding (t-SNE) algorithm to generate a visualization of molecular similarity in two dimensions.[43,44] Color is used as a third dimension to interactively visualize the biological activity or compound class (chemotype). DDM combines two different maps. The first map depicts the chemical space, in which compounds are clustered on the basis of their molecular similarity, whereas in the second map protein targets are clustered on the basis of the chemical similarity of the amino acids making up the kinase domain. By combining the two maps, DDM is able to predict bioactivities of small molecules across a protein family. We applied DDM to visualize the chemical space of currently available drugs, the published kinase inhibitor set (PKIS) and the target space of the protein kinase family (kinome). DDM was able to predict the kinome activity profile of another independent set of kinase inhibitors with comparable or better scores than the currently available machine learning techniques. We applied DDM to identify new hits for the oncogene FMS-like tyrosine kinase 3 (FLT3), a validated therapeutic target for the treatment of acute myeloid leukemia.[45] The hits were resynthesized, and their biological activities were validated in biochemical and cellular assays. Finally, the off-target profiles of the hits as predicted by DDM were validated in a panel of kinase assays. Although our model performs equally well or better than the current computational drug discovery tools, it is envisioned that our model can be further improved when more comprehensive data sets become available in the public domain. In the PKIS training set, 364 inhibitors were tested at only two concentrations on approximately 200 unique wild-type kinases. A more expansive data set of a broader set of more diverse compounds tested on a larger number of kinases in a concentration–response fashion would inherently improve the predictions generated over the entire kinome. The added value of direct knowledge of the off-targets of these compounds enables prioritization in medicinal chemistry efforts, as demonstrated by the KinaseProfiler screen of predicted off-targets. This allows medicinal chemists to rank scaffolds on the basis of acceptable off-targets, which in turn depends on biological questions or medical indications. The information obtained from the docking poses of these molecules can also be used for structure-based design, directly incorporating the knowledge derived from the clinically relevant mutations into the hit-optimization project. The DDM concept presented here can easily be adapted to work with any data set available. Because all data, algorithms, and data processing tools used are in the public domain or open-source, it is highly adaptable and extensible. Concrete examples include different druggable protein classes, such as G-protein-coupled receptors, ion channels, or nuclear hormones, or the ability to be trained on a different molecular set altogether, e.g., solubility, membrane permeability, metabolic stability, pharmacokinetics, or toxicological data. To aid in the implementation of our tool as it is presented here, a Python-based executable including a graphical user interface (Figure ) has been made available online via Github.[46] The unpackaged Python script with a list of dependencies is also available. Also included is a fully annotated KNIME workflow to allow step-by-step execution and analysis. This set of tools should enable the integration of this data-driven approach into any project without any need of investments a priori.
Figure 5

Graphical user interface (left) and generated output (right) of the Python implementation of the DDM algorithm presented here. Only a SMILES string is required as input, and the output is provided as depicted on the right. The packaged executable as well as the original Python script have been made available online.[46]

Graphical user interface (left) and generated output (right) of the Python implementation of the DDM algorithm presented here. Only a SMILES string is required as input, and the output is provided as depicted on the right. The packaged executable as well as the original Python script have been made available online.[46] To conclude, the machine learning algorithm Barnes–Hut t-SNE was successfully implemented in a drug discovery setting to predict ligand–protein interaction landscapes. The concept of DDM is applicable to a multitude of drug discovery challenges, which, given the proper data set, can be used to design a small molecule with a balanced set of physicochemical and biological properties as required for drug candidates. It is envisioned that DDM may make the drug discovery process more efficient.
  35 in total

Review 1.  Kinetic and catalytic mechanisms of protein kinases.

Authors:  J A Adams
Journal:  Chem Rev       Date:  2001-08       Impact factor: 60.622

2.  Computational recognition of potassium channel sequences.

Authors:  Burkhard Heil; Jost Ludwig; Hella Lichtenberg-Fraté; Thomas Lengauer
Journal:  Bioinformatics       Date:  2006-04-04       Impact factor: 6.937

3.  Data-driven identification of prognostic tumor subpopulations using spatially mapped t-SNE of mass spectrometry imaging data.

Authors:  Walid M Abdelmoula; Benjamin Balluff; Sonja Englert; Jouke Dijkstra; Marcel J T Reinders; Axel Walch; Liam A McDonnell; Boudewijn P F Lelieveldt
Journal:  Proc Natl Acad Sci U S A       Date:  2016-10-10       Impact factor: 11.205

Review 4.  Transduction pathway of anti-Müllerian hormone, a sex-specific member of the TGF-beta family.

Authors:  Nathalie Josso; Nathalie di Clemente
Journal:  Trends Endocrinol Metab       Date:  2003-03       Impact factor: 12.015

Review 5.  FLT3 Inhibitors in Acute Myeloid Leukemia: Current Status and Future Directions.

Authors:  Maria Larrosa-Garcia; Maria R Baer
Journal:  Mol Cancer Ther       Date:  2017-06       Impact factor: 6.261

6.  An intuitive graphical visualization technique for the interrogation of transcriptome data.

Authors:  Natascha Bushati; James Smith; James Briscoe; Christopher Watkins
Journal:  Nucleic Acids Res       Date:  2011-06-19       Impact factor: 16.971

7.  Unprecedently Large-Scale Kinase Inhibitor Set Enabling the Accurate Prediction of Compound-Kinase Activities: A Way toward Selective Promiscuity by Design?

Authors:  Serge Christmann-Franck; Gerard J P van Westen; George Papadatos; Fanny Beltran Escudie; Alexander Roberts; John P Overington; Daniel Domine
Journal:  J Chem Inf Model       Date:  2016-08-11       Impact factor: 4.956

8.  2D Representation of Transcriptomes by t-SNE Exposes Relatedness between Human Tissues.

Authors:  Erdogan Taskesen; Marcel J T Reinders
Journal:  PLoS One       Date:  2016-02-23       Impact factor: 3.240

9.  KinMap: a web-based tool for interactive navigation through human kinome data.

Authors:  Sameh Eid; Samo Turk; Andrea Volkamer; Friedrich Rippmann; Simone Fulle
Journal:  BMC Bioinformatics       Date:  2017-01-05       Impact factor: 3.169

10.  A robust methodology to subclassify pseudokinases based on their nucleotide-binding properties.

Authors:  James M Murphy; Qingwei Zhang; Samuel N Young; Michael L Reese; Fiona P Bailey; Patrick A Eyers; Daniela Ungureanu; Henrik Hammaren; Olli Silvennoinen; Leila N Varghese; Kelan Chen; Anne Tripaydonis; Natalia Jura; Koichi Fukuda; Jun Qin; Zachary Nimchuk; Mary Beth Mudgett; Sabine Elowe; Christine L Gee; Ling Liu; Roger J Daly; Gerard Manning; Jeffrey J Babon; Isabelle S Lucet
Journal:  Biochem J       Date:  2014-01-15       Impact factor: 3.857

View more
  6 in total

Review 1.  Application of machine learning in the management of acute myeloid leukemia: current practice and future prospects.

Authors:  Jan-Niklas Eckardt; Martin Bornhäuser; Karsten Wendt; Jan Moritz Middeke
Journal:  Blood Adv       Date:  2020-12-08

2.  iBioProVis: interactive visualization and analysis of compound bioactivity space.

Authors:  Ataberk Donmez; Ahmet Sureyya Rifaioglu; Aybar Acar; Tunca Doğan; Rengul Cetin-Atalay; Volkan Atalay
Journal:  Bioinformatics       Date:  2020-08-15       Impact factor: 6.937

3.  SANCDB: an update on South African natural compounds and their readily available analogs.

Authors:  Bakary N'tji Diallo; Michael Glenister; Thommas M Musyoka; Kevin Lobb; Özlem Tastan Bishop
Journal:  J Cheminform       Date:  2021-05-05       Impact factor: 5.514

Review 4.  Recent advances in drug repurposing using machine learning.

Authors:  Fabio Urbina; Ana C Puhl; Sean Ekins
Journal:  Curr Opin Chem Biol       Date:  2021-07-16       Impact factor: 8.822

5.  Predicting kinase inhibitors using bioactivity matrix derived informer sets.

Authors:  Huikun Zhang; Spencer S Ericksen; Ching-Pei Lee; Gene E Ananiev; Nathan Wlodarchak; Peng Yu; Julie C Mitchell; Anthony Gitter; Stephen J Wright; F Michael Hoffmann; Scott A Wildman; Michael A Newton
Journal:  PLoS Comput Biol       Date:  2019-08-05       Impact factor: 4.475

6.  MAIP: a web service for predicting blood-stage malaria inhibitors.

Authors:  Nicolas Bosc; Eloy Felix; Ricardo Arcila; David Mendez; Martin R Saunders; Darren V S Green; Jason Ochoada; Anang A Shelat; Eric J Martin; Preeti Iyer; Ola Engkvist; Andreas Verras; James Duffy; Jeremy Burrows; J Mark F Gardner; Andrew R Leach
Journal:  J Cheminform       Date:  2021-02-22       Impact factor: 5.514

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.