| Literature DB >> 27335215 |
Sean Ekins1,2, Alexander L Perryman3, Alex M Clark4, Robert C Reynolds5, Joel S Freundlich3,6.
Abstract
The renewed urgency to develop new treatments for Mycobacterium tuberculosis (Mtb) infection has resulted in large-scale phenotypic screening and thousands of new active compounds in vitro. The next challenge is to identify candidates to pursue in a mouse in vivo efficacy model as a step to predicting clinical efficacy. We previously analyzed over 70 years of this mouse in vivo efficacy data, which we used to generate and validate machine learning models. Curation of 60 additional small molecules with in vivo data published in 2014 and 2015 was undertaken to further test these models. This represents a much larger test set than for the previous models. Several computational approaches have now been applied to analyze these molecules and compare their molecular properties beyond those attempted previously. Our previous machine learning models have been updated, and a novel aspect has been added in the form of mouse liver microsomal half-life (MLM t1/2) and in vitro-based Mtb models incorporating cytotoxicity data that were used to predict in vivo activity for comparison. Our best Mtb in vivo models possess fivefold ROC values > 0.7, sensitivity > 80%, and concordance > 60%, while the best specificity value is >40%. Use of an MLM t1/2 Bayesian model affords comparable results for scoring the 60 compounds tested. Combining MLM stability and in vitro Mtb models in a novel consensus workflow in the best cases has a positive predicted value (hit rate) > 77%. Our results indicate that Bayesian models constructed with literature in vivo Mtb data generated by different laboratories in various mouse models can have predictive value and may be used alongside MLM t1/2 and in vitro-based Mtb models to assist in selecting antitubercular compounds with desirable in vivo efficacy. We demonstrate for the first time that consensus models of any kind can be used to predict in vivo activity for Mtb. In addition, we describe a new clustering method for data visualization and apply this to the in vivo training and test data, ultimately making the method accessible in a mobile app.Entities:
Mesh:
Year: 2016 PMID: 27335215 PMCID: PMC4962118 DOI: 10.1021/acs.jcim.6b00004
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Figure 1Global TB pipeline using data from TB Alliance and the Working Group on New TB Drugs Drug Pipeline.
Means and Standard Deviations of Molecular Descriptors for the New in VivoMtb Dataset (N = 60), Comparing Actives and Inactivesa
| MW | AlogP | HBD | HBA | Num Rings | Num Arom Rings | FPSA | RBN | |
|---|---|---|---|---|---|---|---|---|
| active ( | 493.88 ± 219.81 | 3.65 ± 3.00 | 2.07 ± 2.70 | 7.22 ± 2.95 | 3.63 ± 0.83 | 2.15 ± 0.96 | 0.26 ± 0.08 | 7.10 ± 3.18 |
| inactive ( | 427.83 ± 72.78 | 3.63 ± 1.91 | 1.05 ± 0.97 | 6.21 ± 2.17 | 3.68 ± 1.06 | 2.53 ± 0.90 | 0.24 ± 0.10 | 6.47 ± 1.87 |
MW = molecular weight; HBD = number of hydrogen-bond donors; HBA = number of hydrogen-bond acceptors; Num Rings = number of rings; Num Arom Rings = number of aromatic rings; FPSA = fractional polar surface area (sum of areas of the polar regions of the molecular surface divided by the total molecular surface area); RBN = number of rotatable bonds.
p < 0.05.
Figure 2(A) Principal component analysis (PCA) of the updated training set for the in vivo model (blue) and compounds tested in vivo in 2014 and 2015 (yellow). Three principal components explain 86.9% of the variance. (B) PCA of TB Mobile 2 compounds (N = 805, blue) and compounds tested in vivo in 2014 and 2015 (yellow). Three principal components explain 87.5% of the variance.
Fivefold Cross-Validation ROC AUC Values for the Updated (N = 784) in Vivo Machine Learning Models
| Bayesian | SVM | Single Tree | Forest |
|---|---|---|---|
| 0.733 | 0.77 | 0.72 | 0.74 |
Bayesian leave-one-out cross-validation = 0.772.
External Statistics for the in Vivo TB Machine Learning Models Tested on the New in Vivo Mouse TB Data
| machine learning model | sensitivity (%) | specificity (%) | concordance (%) |
|---|---|---|---|
| TB | 70.7 | 36.8 | 60.0 |
| TB | 78.0 | 10.5 | 56.7 |
| RP Forest TB | 85.4 | 26.3 | 66.7 |
| Best RP Tree TB | 78.0 | 21.1 | 60.0 |
| TB | 68.3 | 42.1 | 60.0 |
| TB | 82.9 | 15.8 | 61.7 |
External statistics were calculated from the results of the Bayesian modeling tool on Collaborative Drug Discovery using a cutoff score of >0.65, which produced an internal sensitivity of 0.7 and an internal specificity of 0.67.
External Statistics for the Dual-Event Bayesian (in VitroMtb Efficacy and Non-cytotoxicity in Vero Cells) and Mouse Liver Microsomal Stability Bayesian Models Tested on the New in vivo Mouse TB Data
| machine learning model | sensitivity (%) | specificity (%) | concordance (%) |
|---|---|---|---|
| full | 78.0 | 26.3 | 61.7 |
| pruned | 78.0 | 26.3 | 61.7 |
| TAACF-CB2 dual-event Bayesian | 26.8 | 78.9 | 43.3 |
| combined TB dual-event Bayesian | 34.1 | 78.9 | 48.3 |
| MLSMR dual-event Bayesian | 34.1 | 57.9 | 50.0 |
| consensus: | 7 true positives when predictions agree | 3 true negatives when predictions agree | 58.8 |
| consensus: | 11 true positives when predictions agree | 4 true negatives when predictions agree | 62.5 |
| consensus: | 17 true positives when predictions agree | 3 true negatives when predictions agree | 60.6 |
| modified consensus: | 7 true positives when predictions agree (17.1%) | 17 true negatives (89.5%) | 40.0 |
| modified consensus: | 11 true positives when predictions agree (26.8%) | 16 true negatives (84.2%) | 45.0 |
| modified consensus: | 17 true positives when predictions agree (41.5%) | 13 true negatives (68.4%) | 50.0 |
For the initial consensus approaches, both types of Bayesian models had to classify a compound as good/active for it to be considered as a true positive or false positive. Similarly, both models had to classify a compound as bad/inactive for it to be considered as a true negative or false negative. Since the combination of the models agreed on the classification only for a subset of the test set, the overall sensitivity and overall specificity are not applicable. However, the overall concordance is still relevant and was calculated as (number of true positives + number of true negatives)/(number of compounds on which both models agreed on the good or bad classification).
For the modified consensus approaches, both types of Bayesian models had to classify a compound as good/active for it to be considered as a true positive. However, if either model classified a compound as bad/inactive, it was defined as a true negative or false negative (depending on its experimental value). Thus, the modified consensus approaches made predictions for all of the test compounds.
Confusion Matrices Produced When the Machine Learning Models Were Tested on the New in Vivo Mouse TB Data
| Legend | |
|---|---|
| true positives | false positives |
| false negatives | true negatives |
External statistics were calculated using the Bayesian modeling tool on Collaborative Drug Discovery with a cutoff score of >0.65, which produced an internal sensitivity of 0.7 and an internal specificity of 0.67.
For the consensus approaches, both types of Bayesian models had to classify a compound as good/active for it to be considered as a true positive or false positive (depending on the experimental value of the compound). Similarly, both models had to classify a compound as bad/inactive for it to be considered as a true negative or false negative.
For the modified consensus approaches, both types of Bayesian models had to classify a compound as good/active for it to be considered as a true positive. However, if either model classified a compound as bad/inactive, it was defined as a true negative or false negative (depending on its experimental value).
Coverage = 17/60 = 28%.
Coverage = 24/60 = 40%.
Coverage = 33/60 = 55%.
Coverage = 60/60 = 100%.
External Enrichment Factors in Hit Rates for the Machine Learning Models Tested on the New in Vivo Mouse TB Data
| enrichment factors | ||||
|---|---|---|---|---|
| machine learning model | overall hit
rate (PPV) | for top 10% | for top 10 compounds | for top 20% |
| TB | 29/41 (70.7%) | 1.22 | 1.17 | 0.98 |
| TB | 32/49 (65.3%) | 1.22 | 1.17 | 0.98 |
| RP Forest TB | 35/49 (71.4%) | 0.98 | 1.02 | 0.98 |
| TB | 28/39 (71.8%) | N/A | N/A | N/A |
| TB | 34/50 (68.0%) | 1.22 | 1.02 | 1.10 |
| full | 32/46 (69.6%) | 0.98 | 0.88 | 0.98 |
| pruned | 32/46 (69.6%) | 0.73 | 0.88 | 0.98 |
| TAACF-CB2 dual-event Bayesian | 11/15 (73.3%) | 1.22 | 1.17 | 1.10 |
| combined TB dual-event Bayesian | 14/18 (77.8%) | 1.22 | 1.32 | 1.22 |
| MLSMR dual-event Bayesian | 19/27 (70.4%) | 0.73 | 0.88 | 0.98 |
| consensus: TAACF-CB2 dual-event
Bayesian + full | 7/9 (77.8%); overall EF = 1.14 | N/A | N/A | N/A |
| consensus: combined TB dual-event
Bayesian + full | 11/14 (78.6%); overall EF = 1.15 | N/A | N/A | N/A |
| consensus: MLSMR dual-event
Bayesian + full | 17/23 (73.9%); overall EF = 1.08 | N/A | N/A | N/A |
The hit rate (positive predicted value = PPV) was calculated as (number of true positives)/(number of true positives + number of false positives).
The enrichment factor was calculated as (hit rate in %)/(% of in vivo active compounds in the external test set). Since 41 of the 60 compounds in this external test set (68.3%) were active, the maximum enrichment factor that a perfect model could achieve would be 100%/68.3% = 1.46.
Since each original consensus model and the corresponding “modified consensus” model have the same number of true positives and false positives, their hit rates and enrichment factors are equivalent.
External statistics were calculated from the results of the Bayesian modeling tool on Collaborative Drug Discovery using a cutoff score of >0.65, which produced an internal sensitivity of 0.7 and an internal specificity of 0.67.
Additional External Statistics for the Machine Learning Models Tested on the New in Vivo Mouse TB Dataa
| machine learning model | κ | MCC | ||
|---|---|---|---|---|
| TB | 0.08 | 0.08 | 0.08 | 0.71 |
| TB | –0.13 | –0.14 | –0.11 | 0.71 |
| RP Forest TB | 0.13 | 0.14 | ||
| TB | 0.10 | 0.10 | 0.10 | 0.70 |
| TB | –0.01 | –0.02 | –0.01 | |
| full | 0.05 | 0.05 | 0.04 | 0.74 |
| pruned | 0.05 | 0.05 | 0.04 | 0.74 |
| TAACF-CB2 dual-event Bayesian | 0.04 | 0.06 | 0.06 | 0.39 |
| combined TB dual-event Bayesian | 0.10 | 0.13 | 0.47 | |
| MLSMR dual-event Bayesian | 0.04 | 0.04 | 0.04 | 0.56 |
| consensus: | 0.13 | N/A | 0.67 | |
| consensus: | N/A | 0.71 | ||
| consensus: | 0.04 | N/A | 0.72 | |
| modified consensus: | 0.05 | 0.09 | 0.07 | 0.28 |
| modified consensus: | 0.08 | 0.12 | 0.11 | 0.40 |
| modified consensus: | 0.08 | 0.09 | 0.10 | 0.53 |
The top two scores for each particular type of external statistic are shown in bold.
For the initial consensus approaches, both types of Bayesian models had to classify a compound as good/active for it to be considered as a true positive or false positive. Similarly, both models had to classify a compound as bad/inactive for it to be considered as a true negative or false negative. Consequently, these workflows made active/inactive classifications on only a subset of the test set.
For the modified consensus approaches, both types of Bayesian models had to classify a compound as good/active for it to be considered as a true positive. However, if either model classified a compound as bad/inactive, it was defined as a true negative or false negative (depending on its experimental value). Thus, the modified consensus approaches made predictions for all of the test compounds.
Figure 3Honeycomb clustering of TB in vivo data from 2014 and 2015. Yellow hexagons highlight the compounds from 2014 and 2015, and green outlines signify in vivo active compounds. (A) Complete map of compounds in the training and testing sets. (B) Enlarged view of the section marked with the black circle in (A), highlighting cyclogriselimycin and ecumicin.