| Literature DB >> 20957104 |
Ming Hao1, Yan Li, Yonghua Wang, Shuwei Zhang.
Abstract
This work is devoted to the prediction of a series of 208 structurally diverse PKCθ inhibitors using the Random Forest (RF) based on the Mold(2) molecular descriptors. The RF model was established and identified as a robust predictor of the experimental pIC(50) values, producing good external R(2) (pred) of 0.72, a standard error of prediction (SEP) of 0.45, for an external prediction set of 51 inhibitors which were not used in the development of QSAR models. By using the RF built-in measure of the relative importance of the descriptors, an important predictor-the number of group donor atoms for H-bonds (with N and O)-has been identified to play a crucial role in PKCθ inhibitory activity. We hope that the developed RF model will be helpful in the screening and prediction of novel unknown PKCθ inhibitory activity.Entities:
Keywords: Partial Least Square; Random Forest; Support Vector Machine; protein kinase C θ
Mesh:
Substances:
Year: 2010 PMID: 20957104 PMCID: PMC2956104 DOI: 10.3390/ijms11093413
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Statistical performance of the QSAR models for PKCθ inhibitors.
| Para. | RF | SVM | PLS | |||
|---|---|---|---|---|---|---|
| Training | Test | Training | Test | Training | Test | |
| Size | 157 | 51 | 157 | 51 | 157 | 51 |
| 0.96 | 0.76 | 0.99 | 0.61 | 0.57 | 0.42 | |
| 0.54 | - | 0.57 | - | 0.36 | - | |
| - | 0.72 | - | 0.59 | - | 0.39 | |
| 0.25 | - | 0.08 | - | 0.59 | - | |
| - | 0.45 | - | 0.55 | - | 0.67 | |
R2, coefficient of determination; Q2, cross-validated R2: Q2 based on OOB, 10-fold cross-validation and leave-one-out for RF, SVM and PLS, respectively; R2pred, predictive correlation coefficient for the test set; SEE, standard error of estimate; SEP, standard error of prediction; -, not applicable or available.
Figure 1(A) Scatter plot of the predicted versus observed pIC50 values of the RF model; (B) scatter plot of the predicted versus observed pIC50 values of the SVM model; (C) scatter plot of the predicted versus observed pIC50 values of the PLS model.
Statistical performance of QSAR models from 100 times of 51-chemical-hold-out testing (mean and standard deviation) for PKCθ inhibitors.
| Para | RF | SVM | PLS | |||
|---|---|---|---|---|---|---|
| Training | Test | Training | Test | Training | Test | |
| Size | 157 | 51 | 157 | 51 | 157 | 51 |
| 0.95 ± 0.003 | 0.58 ± 0.09 | 0.82 ± 0.01 | 0.49 ± 0.10 | 0.64 ± 0.13 | 0.41 ± 0.13 | |
| 0.57 ± 0.03 | - | 0.59 ± 0.02 | - | 0.39 ± 0.11 | - | |
| - | 0.56 ± 0.09 | - | 0.45 ± 0.10 | - | 0.10 ± 0.84 | |
| 0.24 ± 0.01 | - | 0.39 ± 0.01 | - | 0.53 ± 0.09 | - | |
| - | 0.59 ± 0.06 | - | 0.63 ± 0.05 | - | 0.79 ± 0.25 | |
R2, coefficient of determination; Q2, cross-validated R2: Q2 based on OOB, 10-fold cross-validation and leave-one-out for RF, SVM and PLS, respectively; R2pred, predictive correlation coefficient for the test set; SEE, standard error of estimate; SEP, standard error of prediction; -, not applicable or available.
Figure 2Residual plot for the training and test sets in the RF model.
Figure 3Boxplots of 30 replications of 5-fold cross-validation correlation at various values of mtry for the PKCθ data set. Horizontal lines inside the boxes are the median correlation.
Figure 4Comparison of the training, out-of-bag, and external test set MSEs for random forest on the PKCθ data set as the number of trees increases.
Figure 5Ordered variable importance scores from RF. The first three important descriptors are surrounded by blue frame.
Representative chemical structures and inhibitory activity of the PKCθ inhibitor dataset.
| No. | Scaffold | Substituent | pIC50 | Ref | ||
|---|---|---|---|---|---|---|
| R1 | R2 | R3 | ||||
| 1 | A | OMe | OMe | 3-Bromophenyl | 5.337 | [ |
| 2 | A | OMe | OMe | Phenyl | 5.796 | [ |
| 3 | A | OMe | OMe | 3-Chlorophenyl | 5.409 | [ |
| X | ||||||
| 17 | B | Pyrrolidine | 8.420 | [ | ||
| 23 | B | H2N | 7.921 | [ | ||
| 27 | B | PhNH | 6.959 | [ | ||
| Ar | R | |||||
| 77 | C | Phenyl | 4-CH2-NMe2 | 7.854 | [ | |
| 80 | C | 3-Pyridine | 5-CH2-NMe2 | 7.076 | [ | |
| 85 | C | Phenyl | 2-OMe,3-CH2-NMe2 | 7.921 | [ | |
| X | ||||||
| 37 | D | 1 | 7.456 | [ | ||
| 41 | D | 2 | 7.469 | [ | ||
| NR’R | ||||||
| 137 | E | Morpholine | 8.108 | [ | ||
| 140 | E | Pyrrolidine | 7.456 | [ | ||
| NR’R | ||||||
| 153 | F | Morpholine | 7.886 | [ | ||
| 157 | F | NHCH2CH(OH)CH2OH | 8.824 | [ | ||
Test set;
from the corresponding reference.