| Literature DB >> 28284192 |
Tao Huang1, Hong Mi1,2, Cheng-Yuan Lin1,3, Ling Zhao1, Linda L D Zhong1,4, Feng-Bin Liu2, Ge Zhang1, Ai-Ping Lu1,4, Zhao-Xiang Bian5,6.
Abstract
BACKGROUND: Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power.Entities:
Keywords: Explicit bioactivity; False discovery rate; Logistic regression; Mechanism-of-action target; Most-similar ligand; Target prediction
Mesh:
Substances:
Year: 2017 PMID: 28284192 PMCID: PMC5346209 DOI: 10.1186/s12859-017-1586-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of MOST for target prediction
Overall performance of MOST in sevenfold cross-validation
| Active data defined by | Performance | ||||||
|---|---|---|---|---|---|---|---|
|
| Accuracy | ||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit Ki | 0.927 ± 0.003 | 0.929 ± 0.003 | 0.948 ± 0.003 | 0.949 ± 0.002 | 0.948 ± 0.001 | 0.946 ± 0.004 | |
| Implicit Ki | 0.939 ± 0.002 | 0.937 ± 0.003 | 0.948 ± 0.003 | 0.949 ± 0.003 | 0.950 ± 0.002 | 0.949 ± 0.003 | |
| MCC | |||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit Ki | 0.352 ± 0.013 | 0.345 ± 0.017 | 0.495 ± 0.010 | 0.484 ± 0.025 | 0.530 ± 0.017 | 0.504 ± 0.032 | |
| Implicit Ki | 0.554 ± 0.008 | 0.543 ± 0.022 | 0.558 ± 0.013 | 0.585 ± 0.023 | 0.585 ± 0.014 | 0.560 ± 0.023 | |
| pKi ≥ 6 | Accuracy | ||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit Ki | 0.848 ± 0.004 | 0.842 ± 0.002 | 0.866 ± 0.002 | 0.862 ± 0.002 | 0.861 ± 0.004 | 0.853 ± 0.004 | |
| Implicit Ki | 0.860 ± 0.004 | 0.855 ± 0.004 | 0.867 ± 0.003 | 0.863 ± 0.003 | 0.867 ± 0.004 | 0.862 ± 0.004 | |
| MCC | |||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit Ki | 0.561 ± 0.014 | 0.540 ± 0.009 | 0.617 ± 0.009 | 0.602 ± 0.007 | 0.609 ± 0.013 | 0.581 ± 0.013 | |
| Implicit Ki | 0.624 ± 0.011 | 0.610 ± 0.612 | 0.632 ± 0.012 | 0.618 ± 0.011 | 0.632 ± 0.013 | 0.618 ± 0.012 | |
Overall performance of MOST in temporal validation
| Active data defined by | Performance | ||||||
|---|---|---|---|---|---|---|---|
|
| Accuracy | ||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit | 0.750 | 0.670 | 0.905 | 0.901 | 0.893 | 0.871 | |
| Implicit | 0.741 | 0.696 | 0.896 | 0.894 | 0.896 | 0.897 | |
| MCC | |||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit | 0.275 | 0.110 | 0.272 | 0.184 | 0.283 | 0.136 | |
| Implicit | 0.267 | 0.138 | 0.292 | 0.213 | 0.256 | 0.192 | |
|
| Accuracy | ||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit | 0.633 | 0.554 | 0.755 | 0.736 | 0.724 | 0.709 | |
| Implicit | 0.632 | 0.556 | 0.761 | 0.737 | 0.759 | 0.726 | |
| MCC | |||||||
| Naïve Bayes | Logistic Regression | Random Forests | |||||
| Morgan | FP2 | Morgan | FP2 | Morgan | FP2 | ||
| Explicit | 0.357 | 0.225 | 0.382 | 0.321 | 0.319 | 0.273 | |
| Implicit | 0.334 | 0.212 | 0.370 | 0.307 | 0.381 | 0.300 | |
Fig. 2Prediction results of MOST in one dataset of sevenfold cross-validation with Logistic Regression method and Morgan fingerprint. a and b, the predicted results derived from different “active” data definition: pKi ≥ 5 and pKi ≥ 6. Results generated by using explicit and implicit Ki of most-similar ligand in model training are compared. Left panels, the predicted results in Tc vs pKi scatter plot. Middle panels, the fraction of data regarding to the increasing threshold of Tc. Right panels, the data fraction regarding to the decreasing threshold of p values. The difference between fTP and fFP was plotted in black, dash line. In all panels, true positives are colored red, while true negatives are blue; false positives are cyan, while false negatives are orange
Fig. 3Predicting novel targets for the drug fluanisone by MOST with FDR control. a, scheme of integrating MOST with FDR control procedure. b, the structure of fluanisone. c, the distribution of p value of predicted targets, which was generated by searching fluanisone against 1,439 human targets via MOST. d, top 5 hits of target prediction for fluanisone. Two novel targets of fluanisone, adrenoceptor alpha 1B (ADRA1B) and adrenoceptor alpha 1D (ADRA1D), were characterized by reference (Keiser et al. [7]) but not CHEMBL database. The adjusted p values were calculated by Benjamini-Hochberg algorithm. e, the inference process of fluanisone novel targets by MOST. Fluanisone was found to be similar (Tc = 0.70) to compound CHEMBL8618, which potently acts on ADRA1B and ADRA1D. They were assigned small p values by MOST
Fig. 4Predicting and validating the mechanism-of-action target which mediated the lataxive effect of aloe-emodin, natural product from CTM. a, aloe-emodin was predicted to act on acetylcholinesterase (ACHE) by MOST via the most-similar ligand, CHEMBL3233826. The IC50 of ACHE inhibition by aloe-emodin was reported to be 26.8 μM (Wang et al. [14]). Inhibition of ACHE results in elevating the level of acetylcholine, activating muscarinic receptors (M2 and M3), and enhancing the gastrointestinal motility. b, aloe-emodin dose-dependently stimulated the fecal pellets in mice. c, the stimulative effect of aloe-emodin was abolished by muscarinic receptors antagonist, atropine. For each group, the relative fecal pellets in 2 h were compared with the control group, and tested by unpaired t-test in Prism 6 (n = 10; ****, p < 0.0001; *, p < 0.05). All data in b and c are presented in Mean ± S.E.M