| Literature DB >> 31646222 |
Dipankar Roy1, Vijaya Kumar Hinge1, Andriy Kovalenko1,2.
Abstract
Predicting the ability of chemical species to cross the blood-brain barrier (BBB) is an active field of research for development and mechanistic understanding in the pharmaceutical industry. Here, we report the BBB permeability of a large data set of compounds by incorporating molecular solvation energy descriptors computed by the 3D-RISM-KH molecular solvation theory. We have been able to show, for the first time, that the computed excess chemical potential in different solvents can be successfully used to predict permeability of compounds in a binary manner (yes/no) via a minimum-descriptor-based model. Our findings successfully combine the molecular solvation theory with the machine learning approach to address one of the most daunting challenges in predictive structure-activity relationship modeling. The workflow presented in this work is simple enough to be used by nonexperts with ease.Entities:
Year: 2019 PMID: 31646222 PMCID: PMC6796930 DOI: 10.1021/acsomega.9b01512
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Summary of statistical analysis: (a) Important variables selected via random forest (green circle, eight descriptors) and gradient boost machine (blue circle, five descriptors) algorithms; (b) variable importance as calculated by the random forest algorithm; (c) relative importance of variables predicted by the gradient boost machine method. The overlapping zone of the two circles contains descriptors common to both of the machine learning approaches. MeanDecreaseGini is the mean of a variable’s total decrease in node impurity weighted by the proportion of samples reaching that specific node in each individual decision tree in a random-forest-based classification. The larger is this value, the larger is the contribution of the corresponding descriptor.
Performance Indicesa of Different Classification Schemes Based on Five Prediction Models with 3D-RISM-KH Calculated Excess Chemical Potentials for the Test Set of Compounds
| model | accuracy | precision | sensitivity | specificity | F1-score |
|---|---|---|---|---|---|
| GBM method | |||||
| model A | 0.89 | 0.90 | 0.96 | 0.65 | 0.93 |
| model B | 0.90 | 0.91 | 0.96 | 0.69 | 0.93 |
| model C | 0.88 | 0.90 | 0.94 | 0.66 | 0.92 |
| model D | 0.90 | 0.91 | 0.97 | 0.67 | 0.93 |
| model E | 0.88 | 0.89 | 0.96 | 0.62 | 0.92 |
| GLM method | |||||
| model A | 0.70 | 0.82 | 0.79 | 0.41 | 0.81 |
| model B | 0.80 | 0.80 | 0.93 | 0.36 | 0.88 |
| model C | 0.84 | 0.85 | 0.96 | 0.42 | 0.90 |
| model D | 0.80 | 0.82 | 0.94 | 0.33 | 0.88 |
| model E | 0.84 | 0.84 | 0.99 | 0.34 | 0.91 |
| SVM method | |||||
| model A | 0.99 | 0.99 | 0.99 | 0.95 | 0.99 |
| model B | 0.97 | 0.97 | 0.99 | 0.90 | 0.98 |
| model C | 0.95 | 0.96 | 0.98 | 0.86 | 0.97 |
| model D | 0.94 | 0.95 | 0.98 | 0.82 | 0.96 |
| model E | 0.92 | 0.93 | 0.98 | 0.75 | 0.95 |
| weighted kNN method | |||||
| model A | 0.90 | 0.93 | 0.93 | 0.77 | 0.93 |
| model B | 0.89 | 0.90 | 0.96 | 0.63 | 0.98 |
| model C | 0.88 | 0.90 | 0.95 | 0.63 | 0.92 |
| model D | 0.88 | 0.89 | 0.96 | 0.62 | 0.92 |
| model E | 0.88 | 0.89 | 0.96 | 0.61 | 0.92 |
For a measure of percentage of accuracy indices, individual values are multiplied by 100. The performance indices are calculated as follows: Accuracy = (TP + TN)/(TP + TN + FP + FN). Precision = TP/(TP + FP). Sensitivity = TP/(TP + FN). Specificity = TN/(TN + FP). F1-score = 2 × (precision × sensitivity)/(precision + sensitivity). TP = true positive, TN = true negative, FP = false positive, FN = false negative.
Performance Indicesa of Different Classification Schemes Based on Model A with CPCM and SMD Calculated Solvation Energies for the Test Set of Compounds
| model | accuracy | precision | sensitivity | specificity | F1-score |
|---|---|---|---|---|---|
| GBM method | |||||
| CPCM | 0.88 | 0.90 | 0.96 | 0.63 | 0.92 |
| SMD | 0.88 | 0.90 | 0.96 | 0.63 | 0.91 |
| GLM method | |||||
| CPCM | 0.81 | 0.83 | 0.95 | 0.34 | 0.89 |
| SMD | 0.81 | 0.82 | 0.95 | 0.33 | 0.89 |
| SVM method | |||||
| CPCM | 0.99 | 0.99 | 0.99 | 0.95 | 0.99 |
| SMD | 0.99 | 0.99 | 1.0 | 0.95 | 0.99 |
| weighted kNN method | |||||
| CPCM | 0.90 | 0.90 | 0.97 | 0.65 | 0.93 |
| SMD | 0.89 | 0.90 | 0.96 | 0.65 | 0.93 |
For a measure of percentage of accuracy indices, individual values are multiplied by 100. See the footnote in Table for the definitions of performance indices used.
Performance Indicesa of Different Classification Schemes Based on a Model with Only Eight 2D-Molecular Descriptors
| statistical method | accuracy | precision | sensitivity | specificity | F1-score |
|---|---|---|---|---|---|
| GBM | 0.87 | 0.89 | 0.96 | 0.60 | 0.92 |
| GLM | 0.81 | 0.83 | 0.95 | 0.33 | 0.88 |
| SVM | 0.96 | 0.96 | 0.99 | 0.87 | 0.97 |
| weighted kNN | 0.87 | 0.89 | 0.95 | 0.59 | 0.92 |
For a measure of percentage of accuracy indices, individual values are multiplied by 100. See the footnote in Table for the definitions of performance indices used.
Figure 2Performance indices of different models used for classification.