| Literature DB >> 24665947 |
Sean Ekins1, Richard Pottorf, Robert C Reynolds, Antony J Williams, Alex M Clark, Joel S Freundlich.
Abstract
Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, support vector machines, and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost-effective predictor for in vivo testing elsewhere in other diseases.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24665947 PMCID: PMC4004261 DOI: 10.1021/ci500077v
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Test Set of in Vivo Active Compounds Not in the TB in Vivo Modelsa
| name (number or abbreviation relates to original nomenclature) | Forest | Single Tree | Bayesian score | Bayesian class | closest distance | SVM | ref |
|---|---|---|---|---|---|---|---|
| 1070 – anisaldehyde, thiosemicarbazone | 0 | 0 | –3.02 | 0 | 0.35 | 0 | ( |
| 1493 − 1-( | 0 | 0 | –1.19 | 1 | 0.46 | 0 | ( |
| 2403 − | 0 | 0 | –3.70 | 0 | 0.44 | 0 | ( |
| 2406 − | 0 | 0 | 1.54 | 1 | 0.53 | 1 | ( |
| 2875 – nicotinamide | 0 | 1 | –1.06 | 1 | 0.40 | 1 | ( |
| viomycin | 0 | 1 | 10.95 | 1 | 0.27 | 1 | ( |
| neomycin | 1 | 0 | 10.11 | 1 | 0.01 | 1 | ( |
| PCIH − 2-pyridylcarboxaldehyde | 1 | 1 | –1.53 | 1 | 0.39 | 1 | ( |
| Cpd 3 – N-(2-fluoroethyl)-1-((6-methoxy-5-methylpyrimidin-4-yl)-methyl)-1H-pyrrolo[3,2-b]pyridine-3-carboxamide | 1 | 1 | –0.44 | 1 | 0.56 | 1 | ( |
| Cpd 4 −- N-(cyclopropylmethyl)-1-((6-methoxy-5-methylpyrimidin-4-yl)methyl)-1H-pyrrolo[3,2-b]pyridine-3-carboxamide | 0 | 0 | –2.06 | 1 | 0.53 | 0 | ( |
| indoleamide 3 | 0 | 0 | –6.20 | 0 | 0.31 | 1 | ( |
Prediction scores 1 = active, 0 = inactive.
Mean (Standard Deviation) of Molecular Descriptors for in VivoN = 773 in Vivo Mtb Data set, Comparing Actives and Inactivesa
| MW | AlogP | HBD | HBA | Num Rings | Num Arom Rings | FPSA | RBN | |
|---|---|---|---|---|---|---|---|---|
| active ( | 417.25 ± 454.39 | 3.11 ± 2.71 | 1.49 ± 2.17 | 6.68 ± 8.33 | 2.96 ± 2.09 | 1.72 ± 1.46 | 0.29 ± 0.13 | 7.84 ± 22.48 |
| inactive ( | 386.95 ± 440.40 | 3.89 ± 4.88 | 1.39 ± 1.86 | 5.75 ± 6.20 | 2.51 ± 2.70 | 1.90 ± 2.37 | 0.31 ± 0.14 | 8.09 ± 16.05 |
MWT = molecular weight, HBD = hydrogen bond donor, HBA = hydrogen bond acceptor, Num Rings = number of Rings, Num Arom Rings = number of aromatic rings, FPSA = fractional polar surface area, and RBN = rotatable bond number. Fractional polar surface area (FPSA) = total partially positively charged molecular surface area divided by the total molecular surface area.
p < 0.05.
Figure 1Coverage of Mtb in vivo molecule property space: (A) N = 773 compounds showing how some actives (yellow) are outside the major cluster and represent more diverse molecules. 3PCs describe 87% of variance. (B) Highlighting known first and second line TB drugs and others used against the disease (bedaquiline, moxifloxacin, ofloxacin, sparfloxacin, imipenem, gatifloxacin, rifampin, pyrazinamide, rifalazil, rifapentine, rifabutin, levofloxacin, clarithromycin, amikacin, kanamycin, streptomycin, capreomycin IA, ethambutol, ethionamide, isoniazid, and meropenem). Most Mtb drugs (yellow) are hidden in the large blue cluster; top left-hand cluster is amikacin, capreomycin IA, kanamycin, and streptomycin.
Figure 2Coverage of Mtb target molecule property space: (A) 745 TB Mobile molecules (blue) with annotated targets and 773-member TB in vivo training set (yellow) PCA; 3PCs explain 88% of variance. (B) Comparison of TB target molecule property space using data from TB Mobile (blue) and 1770 Mtb metabolites (yellow) using data from BioCyc.[51] 3PCs explain 89% of the variance. (C) Comparison of 1770 Mtb metabolites (blue) and 773-member TB in vivo data set (yellow); 3PCs explain 87% of the variance.
Figure 3Coverage of Mtb in vitro growth inhibitor chemistry property space. (A) 1429 TB in vitro actives (blue) and 773 molecule TB in vivo data set (yellow) PCA; 3PCs explain 83.7% of variance. Aminoglycosides are shown toward the top of the plot. (B) Highlighting the TB in vivoactive compounds only (yellow).
Figure 4(A) Triazine Markush structure for analogs of TCMDC-125802 (R1 = R2 = NHPh; R3 = H). (B) Matrix correlation plot showing cells with Mtb in vitro (left) and in vivo (right) data. (C) Solid cells are used to show assayed compounds, and colored dots for activity estimates for hypothetical compounds using internally generated predictions. Green is a favorable. Red is unfavorable. Yellow is intermediate.
Mean (± sd) Leave-One-Out and Leave-Out 50% × 100 Cross Validation of Bayesian Modelsa
| leave-one-out ROC | leave-out 50% × 100 external ROC score | leave-out 50% × 100 internal ROC score | leave-out 50% × 100 concordance | leave-out 50% × 100 specificity | leave-out 50% × 100 sensitivity |
|---|---|---|---|---|---|
| 0.77 | 0.72 ± 0.02 | 0.74 ± 0.02 | 66.91 ± 2.24 | 74.23 ± 8.96 | 58.46 ± 9.19 |
ROC = receiver operator characteristic. Best split −2.195.
Individual Machine Learning Model Cross Validation Receiver Operator Curve Statistics for 773 Molecules Tested in the Mouse in Vivo Model for Mtba
| RP Forest (out of bag ROC) | RP Single Tree (with five-fold cross validation ROC) | SVM (with five-fold cross validation ROC) | Bayesian (with five-fold cross validation ROC) |
|---|---|---|---|
| 0.75 | 0.71 | 0.77 | 0.73 |
Bayesian five-fold cross validation has sensitivity = 66.3%, specificity = 90.3%, and concordance = 79.0%.