| Literature DB >> 26860111 |
Swapnil Chavan1, Ahmed Abdelaziz2, Jesper G Wiklander3, Ian A Nicholls4,5.
Abstract
A series of 172 molecular structures that block the hERG K(+) channel were used to develop a classification model where, initially, eight types of PaDEL fingerprints were used for k-nearest neighbor model development. A consensus model constructed using Extended-CDK, PubChem and Substructure count fingerprint-based models was found to be a robust predictor of hERG activity. This consensus model demonstrated sensitivity and specificity values of 0.78 and 0.61 for the internal dataset compounds and 0.63 and 0.54 for the external (PubChem) dataset compounds, respectively. This model has identified the highest number of true positives (i.e. 140) from the PubChem dataset so far, as compared to other published models, and can potentially serve as a basis for the prediction of hERG active compounds. Validating this model against FDA-withdrawn substances indicated that it may even be useful for differentiating between mechanisms underlying QT prolongation.Entities:
Keywords: Classification model; Ikr; KCNH2; Toxicity; hERG blockers; k-nearest neighbor (k-NN)
Mesh:
Substances:
Year: 2016 PMID: 26860111 PMCID: PMC4802000 DOI: 10.1007/s10822-016-9898-z
Source DB: PubMed Journal: J Comput Aided Mol Des ISSN: 0920-654X Impact factor: 3.686
Classification of training and test set compounds
| Class 1 (hERG active) | Class 2 (hERG inactive) | Total | |
|---|---|---|---|
| Training | 93 | 79 | 172 |
| Test | 221 | 1574 | 1795 |
Summary of statistical parameters for the k-NN classification models
| Entry | Fingerprints | NER |
| Sensitivity | Specificity | ||
|---|---|---|---|---|---|---|---|
| Class 1 | Class 2 | Class 1 | Class 2 | ||||
| 1 |
| ||||||
| Fitting | 0.68 | 1 | 0.72 | 0.65 | 0.65 | 0.72 | |
| CV | 0.66 | 1 | 0.72 | 0.61 | 0.61 | 0.72 | |
| External | 0.54 | 1 | 0.52 | 0.57 | 0.57 | 0.52 | |
| 2 |
| ||||||
| Fitting | 0.68 | 1 | 0.73 | 0.62 | 0.62 | 0.73 | |
| CV | 0.66 | 1 | 0.72 | 0.61 | 0.61 | 0.72 | |
| External | 0.53 | 1 | 0.49 | 0.57 | 0.57 | 0.49 | |
| 3 |
| ||||||
| Fitting | 0.67 | 1 | 0.70 | 0.63 | 0.63 | 0.70 | |
| CV | 0.65 | 1 | 0.70 | 0.61 | 0.61 | 0.70 | |
| External | 0.56 | 1 | 0.56 | 0.57 | 0.57 | 0.56 | |
| 4 |
| ||||||
| Fitting | 0.64 | 1 | 0.69 | 0.59 | 0.59 | 0.69 | |
| CV | 0.64 | 1 | 0.70 | 0.58 | 0.58 | 0.70 | |
| External | 0.55 | 1 | 0.52 | 0.57 | 0.57 | 0.52 | |
| 5 |
| ||||||
| Fitting | 0.68 | 6 | 0.76 | 0.59 | 0.59 | 0.76 | |
| CV | 0.67 | 6 | 0.76 | 0.57 | 0.57 | 0.76 | |
| External | 0.55 | 6 | 0.54 | 0.55 | 0.55 | 0.54 | |
| 6 |
| ||||||
| Fitting | 0.60 | 3 | 0.69 | 0.52 | 0.52 | 0.69 | |
| CV | 0.60 | 3 | 0.71 | 0.49 | 0.49 | 0.71 | |
| External | 0.57 | 3 | 0.62 | 0.52 | 0.52 | 0.62 | |
| 7 |
| ||||||
| Fitting | 0.68 | 1 | 0.70 | 0.67 | 0.67 | 0.70 | |
| CV | 0.67 | 1 | 0.69 | 0.66 | 0.66 | 0.69 | |
| External | 0.57 | 1 | 0.54 | 0.59 | 0.59 | 0.54 | |
| 8 |
| ||||||
| Fitting | 0.67 | 1 | 0.74 | 0.61 | 0.61 | 0.74 | |
| CV | 0.68 | 1 | 0.72 | 0.65 | 0.65 | 0.72 | |
| External | 0.58 | 1 | 0.61 | 0.56 | 0.56 | 0.61 | |
Statistical parameters for the consensus models
| Modela | Dataset | TPb | FPc | TNd | FNe | TP + TN | Totalf | Qg | Sens.h | Spec.i | Prec.j | G-meank |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Training | 72 | 25 | 54 | 21 | 126 | 172 | 0.73 | 0.77 | 0.68 | 0.74 | 0.73 |
| Validation | 130 | 654 | 920 | 91 | 1050 | 1795 | 0.58 | 0.59 | 0.58 | 0.17 | 0.59 | |
| 2 | Training | 73 | 31 | 48 | 20 | 121 | 172 | 0.70 | 0.78 | 0.61 | 0.70 | 0.69 |
| Validation | 140 | 723 | 851 | 81 | 991 | 1795 | 0.55 | 0.63 | 0.54 | 0.16 | 0.59 | |
| 3 | Training | 71 | 31 | 48 | 22 | 119 | 172 | 0.69 | 0.76 | 0.61 | 0.70 | 0.68 |
| Validation | 135 | 707 | 867 | 86 | 1002 | 1795 | 0.56 | 0.61 | 0.55 | 0.16 | 0.58 | |
| 4 | Training | 74 | 32 | 47 | 19 | 121 | 172 | 0.70 | 0.80 | 0.59 | 0.70 | 0.69 |
| Validation | 128 | 718 | 856 | 93 | 984 | 1795 | 0.55 | 0.58 | 0.54 | 0.15 | 0.56 | |
| 5 | Training | 73 | 29 | 50 | 20 | 123 | 172 | 0.72 | 0.78 | 0.63 | 0.72 | 0.70 |
| Validation | 132 | 685 | 889 | 89 | 1021 | 1795 | 0.57 | 0.60 | 0.56 | 0.16 | 0.58 | |
| 6 | Training | 73 | 28 | 51 | 20 | 124 | 172 | 0.72 | 0.78 | 0.65 | 0.72 | 0.71 |
| Validation | 131 | 675 | 899 | 90 | 1030 | 1795 | 0.57 | 0.59 | 0.57 | 0.16 | 0.58 |
aModel 1 = substructure (SS) + substructure count (SSC) + extended CDK (ECDK), 2 = PubChem (PC) + SSC + ECDK, 3 = PC + SSC + SS, 4 = PC + SSC + MACCS, 5 = PC + SSC + ECDK + SC + MACCS, 6 = PC + SSC + ECDK + SS + MACCS + CDK + CDK Graph, b true positives, c false positives, d true negatives, e false negatives, f TP + TN + FP + FN, g overall accuracy of prediction, h sensitivity, i specificity, j precision, k
Fig. 1Venn diagram representing the number of training set compounds correctly predicted by all three models (yellow), by any two models (magenta), by only one model (blue) and by none of the models (green). The shaded area represents compounds correctly predicted by the consensus model
Comparison of the k-NN classification model with other models
| Model | Our study | Su et al. [ | Wang et al. [ | Su et al. [ | Li et al. [ |
|---|---|---|---|---|---|
| Method |
| SVM | Naive Bayesian classifier | PLS transformed into binary QSAR | SVM |
| Descriptors | 2D PaDEL fingerprints | 2D and 3D MOE, 4D fingerprints from MD simulation | Physico-chemical property based and geometry based descriptors, and fingerprints | 2D and 3D MOE descriptors and 4D fingerprints | GRIND descriptors derived from docking |
|
| |||||
| Cut-off (µM) | 5 | – | 10 | 40 | 40 |
| Total | 172 | 546 | 719 | 250 | 495 |
| True positives | 73 | 188 | 247 | – | 83 |
| True negatives | 48 | 242 | 315 | – | 283 |
| Sensitivity | 0.78 | 0.90 | 0.89 | – | 0.55 |
| Specificity | 0.61 | 0.72 | 0.72 | – | 0.83 |
| Q | 0.70 | 0.79 | 0.78 | – | 0.74 |
| F-measurea | 0.74 | 0.76 | 0.76 | – | 0.56 |
| G-mean | 0.69 | 0.80 | 0.80 | – | 0.67 |
|
| |||||
| Cut-off (%)b | 20 | 20 | 20 | 20 | 20 |
| Total | 1795 | 1668 | 1953 | 1668 | 1877 |
| True positives | 140 | 67 | 135 | 121 | 107 |
| True negatives | 851 | 1298 | 1247 | 963 | 1271 |
| Sensitivity | 0.63 | 0.41 | 0.54 | 0.74 | 0.57 |
| Specificity | 0.54 | 0.86 | 0.73 | 0.64 | 0.75 |
| Q | 0.55 | 0.82 | 0.71 | 0.65 | 0.73 |
| F-measure | 0.26 | 0.31 | 0.32 | 0.29 | 0.30 |
| G-mean | 0.59 | 0.60 | 0.63 | 0.69 | 0.66 |
a2[(precision*sensitivity)/(precision + sensitivity)], b % hERG blockage