| Literature DB >> 21299868 |
Norman C Ledonne1, Kevin Rissolo, James Bulgarelli, Leonard Tini.
Abstract
BACKGROUND: Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists.Entities:
Year: 2011 PMID: 21299868 PMCID: PMC3045354 DOI: 10.1186/1758-2946-3-7
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Distribution of Efflux Values used for Training Prediction Models. Count of SALI values per bin versus SALI value
Figure 2Distribution of SALI Values Calculated for 2D Training Sets. Count of efflux value per bin versus efflux value
Summary of 2D model performance
| Model | ANNE | SVM | MLR | KPLS | RF | PLS | ANNE AZ | ANNE Random |
|---|---|---|---|---|---|---|---|---|
| MAE | 0.19 | 0.22 | 0.20 | 0.19 | 0.08 | 0.22 | 0.20 | 0.19 |
| Kendall τ | 0.63 | 0.58 | 0.61 | 0.63 | 0.86 | 0.60 | 0.60 | 0.62 |
| SCI | 0.12 | 0.20 | 0.90 | 0.48 | 0.94 | 0.12 | 0.17 | -0.13 |
| S(0) | 0.63 | 0.57 | 0.60 | 0.62 | 0.86 | 0.58 | 0.59 | 0.62 |
| S(1) | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | -1.00 |
| MAE | 0.22 | 0.25 | 0.23 | 0.24 | 0.22 | 0.25 | 0.21 | 0.22 |
| Kendall τ | 0.51 | 0.45 | 0.48 | 0.48 | 0.51 | 0.45 | 0.53 | 0.56 |
| SCI | 0.83 | 0.93 | 0.93 | 0.94 | 0.94 | 0.75 | 0.96 | -0.67 |
| S(0) | 0.52 | 0.46 | 0.50 | 0.50 | 0.52 | 0.46 | 0.54 | 0.56 |
| S(1) | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | -1.00 |
| MAE | 0.36 | 0.32 | 0.52 | 0.19 | * | 0.33 | 0.36 | 0.35 |
| Kendall τ | 0.34 | 0.36 | 0.14 | 0.37 | * | 0.32 | 0.36 | 0.33 |
| SCI | 0.98 | 0.98 | 0.72 | 0.78 | * | 0.97 | 0.77 | 0.98 |
| S(0) | 0.35 | 0.37 | 0.15 | 0.38 | * | 0.33 | 0.37 | 0.35 |
| S(1) | 1.00 | 1.00 | 1.00 | 1.00 | * | 1.00 | 1.00 | 1.00 |
Models using ADMET 2D Predictor descriptors and Kohonen map: ANNE, ADMET Predictor neural net; SVM, ADMET Predictor support vector machine; MLR, ADMET Predictor multiple linear regression; KPLS, ADMET Predictor kernel partial least squares; RF, Pipeline Pilot random forest; PLS, SIMCA-P+ partial least squares; ANNE AZ, ADMET Predictor neural net with AZ descriptors; ANNE Random, ADMET Predictor neural net with randomized choice of training/test sets. The performance properties of the models were calculated as described in CALCULATIONS AND STATISTICS. The properties were not calculated for RF since prediction outliers could not be identified.
Figure 3SALI Curves for Efflux Prediction Data Generated using 2D Descriptors. 1a, Training Set; 1b, Test Set; 1c, Prospective Set.
Figure 4Box Plots of Daylight FP Scores of Training, Test and Prospective Sets. Solid line, median Tanimoto score; dashed line, average Tanimoto score. Box shows Q1-Q3. Dots are outliers (greater than upper quartile + 1.5 times interquartile range)
Summary of 3D model performance
| Model | ANNE | SVM | MLR | KPLS | RF | PLS |
|---|---|---|---|---|---|---|
| MAE | 0.19 | 0.21 | 0.20 | 0.22 | 0.10 | 0.22 |
| Kendall | 0.64 | 0.62 | 0.62 | 0.58 | 0.86 | 0.60 |
| SCI | 0.67 | 0.73 | 0.87 | 0.86 | 0.99 | 0.13 |
| S(0) | 0.64 | 0.62 | 0.62 | 0.58 | 0.86 | 0.59 |
| S(1) | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | -1.00 |
| MAE | 0.20 | 0.22 | 0.23 | 0.25 | 0.20 | 0.23 |
| Kendall | 0.55 | 0.52 | 0.49 | 0.43 | 0.56 | 0.48 |
| SCI | 0.93 | 0.83 | 0.83 | -0.74 | -0.66 | 0.73 |
| S(0) | 0.56 | 0.52 | 0.50 | 0.44 | 0.57 | 0.49 |
| S(1) | 1.00 | 1.00 | -1.00 | -1.00 | -1.00 | 1.00 |
| MAE | 0.35 | 0.34 | 0.74 | 0.34 | * | 0.32 |
| Kendall | 0.34 | 0.35 | -0.09 | 0.38 | * | 0.34 |
| SCI | -0.65 | -0.49 | -0.90 | 0.80 | * | -0.69 |
| S(0) | 0.35 | 0.36 | -0.07 | 0.39 | * | 0.35 |
| S(1) | -1.00 | -1.00 | -1.00 | 1.00 | * | -1.00 |
Models using ADMET 3D Predictor descriptors and Kohonen map: ANNE, ADMET Predictor neural net; SVM, ADMET Predictor support vector machine; MLR, ADMET Predictor multiple linear regression; KPLS, ADMET Predictor kernel partial least squares; RF, Pipeline Pilot random forest; PLS, SIMCA-P+ partial least squares. The performance properties of the models were calculated as described in CALCULATIONS AND STATISTICS. The properties were not calculated for RF since prediction outliers could not be identified
Figure 5SALI Curves for Efflux Prediction Data Generated using 3D Descriptors. 1a, Training Set; 1b, Test Set; 1c, Prospective Set.