| Literature DB >> 23721648 |
Faizan Sahigara1, Davide Ballabio, Roberto Todeschini, Viviana Consonni.
Abstract
BACKGROUND: With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model's Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model's AD has yet been recognized.Entities:
Year: 2013 PMID: 23721648 PMCID: PMC3679843 DOI: 10.1186/1758-2946-5-27
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Scatter plot of the simulated dataset.
Figure 2Simulated data set. Thresholds t vs. number of training neighbours K plot (k = 12).
Figure 3Simulated data set. Contour plot to demonstrate how the AD was characterised. Metric used: Euclidean distance; k = 12.
Figure 4Simulated data set. Box-and-whisker plot of test samples (%) retained within the AD for different k values during k-optimization.
Summary of model statistics for the case study
| CAESAR Model 2 | 378 | 0.804 | 0.591 | 95 | 0.797 | 0.600 |
R Determination coefficient; RMSE Root-mean-square error; Q Predictive squared correlation coefficient; and RMSEP Root-mean-square error of prediction.
Figure 5CAESAR BCF model. Box-and-whisker plot of test samples (%) retained within the AD for different k values during k-optimization.
Figure 6CAESAR BCF model. Absolute standardized error of test samples plotted against their K values.
Comparison of AD methods applied to the test set of CAESAR BCF model
| All samples inside (no AD approach) | 95 | 0.797 | None |
| Proposed approach (Euclidean dist., | 91 | 0.803 | 33 61 82 83 |
| Bounding box | 95 | 0.797 | None |
| PCA bounding box | 93 | 0.804 | 33 40 |
| Convex hull | 73 | 0.789 | 3 7 9 13 18 33 34 36 37 38 39 40 41 43 51 56 61 72 79 91 92 94 |
| Euclidean dist (95 percentile) | 88 | 0.802 | 3 33 36 37 40 42 61 |
| Mahalanobis dist (95 percentile) | 89 | 0.791 | 18 43 54 61 83 91 |
| Classical kNN (Euclidean dist., | 87 | 0.797 | 3 33 34 40 61 82 83 94 |
| Fixed Gaussian kernel | 85 | 0.794 | 3 24 33 34 40 61 82 83 91 94 |
| Optimized Gaussian kernel | 66 | 0.831 | 3 912 22 24 33 34 38 40 45 47 51 53 54 56 61 68 69 75 76 80 82 83 87 89 91 93 94 95 |
| Variable Gaussian kernel ( | 81 | 0.790 | 3 24 33 34 40 43 61 80 82 83 89 91 94 95 |
| Adaptive Gaussian kernel | 88 | 0.801 | 3 33 43 61 82 83 91 |
| Fixed Epanechnikov kernel | 87 | 0.799 | 3 33 40 43 61 83 91 94 |
| Nearest neighbour density estimator ( | 91 | 0.806 | 3 33 61 91 |