| Literature DB >> 35425351 |
Mengyi Shan1, Chen Jiang1,2, Jing Chen1,3, Lu-Ping Qin1, Jiang-Jiang Qin4, Gang Cheng1.
Abstract
Compounds with human ether-à-go-go related gene (hERG) blockade activity may cause severe cardiotoxicity. Assessing the hERG liability in the early stages of the drug discovery process is important, and the in silico methods for predicting hERG channel blockers are actively pursued. In the present study, the directed message passing neural network (D-MPNN) was applied to construct classification models for identifying hERG blockers based on diverse datasets. Several descriptors and fingerprints were tested along with the D-MPNN model. Among all these combinations, D-MPNN with the moe206 descriptors generated from MOE (D-MPNN + moe206) showed significantly improved performances. The AUC-ROC values of the D-MPNN + moe206 model reached 0.956 ± 0.005 under random split and 0.922 ± 0.015 under scaffold split on Cai's hERG dataset, respectively. Moreover, the comparisons between our models and several recently reported machine learning models were made based on various datasets. Our results indicated that the D-MPNN + moe206 model is among the best classification models. Overall, the excellent performance of the DMPNN + moe206 model achieved in this study highlights its potential application in the discovery of novel and effective hERG blockers. This journal is © The Royal Society of Chemistry.Entities:
Year: 2022 PMID: 35425351 PMCID: PMC8979305 DOI: 10.1039/d1ra07956e
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 3.361
Fig. 1Comparison of the AUC-ROC values of D-MPNN and deephERG model across different decoy thresholds on Cai's validation set.
Fig. 2(A) Performances of the D-MPNN with the single molecular characterization and the combinations of molecular characterizations (control, ECFP4, ECFP6, FCFP4, MACCS, PubchemFP, RDkit 2D normalized, MOE53, moe206, mol2vec, MOE53 + mol2vec, MOE53 + RDkit 2D normalized) on the validation set under random split. (B) Performances on the validation set under scaffold balanced split.
Comparison the performance of D-MPNN + moe206 model with other best models
| Model | Training set | Test set | AUC-ROC | SE | SP | ACC | |
|---|---|---|---|---|---|---|---|
| 1 | SVM + FCFP | D3 training | D3 test | 0.93 | 0.81 | 0.89 | 0.86 |
| D-MPNN + moe206 | 0.958 ± 0.005 | 0.900 ± 0.019 | 0.913 ± 0.016 | 0.907 ± 0.010 | |||
| D-MPNN | 0.955 ± 0.005 | 0.881 ± 0.032 | 0.907 ± 0.027 | 0.896 ± 0.002 | |||
| 2 | Consensus model | Training | Validation | NA | 0.74 | 0.86 | NA |
| D-MPNN + moe206 | 0.864 ± 0.021 | 0.808 ± 0.077 | 0.798 ± 0.039 | 0.798 ± 0.033 | |||
| D-MPNN | 0.819 ± 0.012 | 0.638 ± 0.065 | 0.844 ± 0.037 | 0.831 ± 0.031 | |||
| 3 | Consensus model | Training | FDA-1 | 0.79 | 0.71 | 0.78 | NA |
| D-MPNN + moe206 | 0.882 ± 0.013 | 0.613 ± 0.110 | 0.856 ± 0.023 | 0.835 ± 0.018 | |||
| D-MPNN | 0.813 ± 0.032 | 0.413 ± 0.099 | 0.884 ± 0.019 | 0.844 ± 0.013 | |||
| 4 | SVM | Training I | Test I | 0.842 | 0.907 | 0.652 | 0.821 |
| D-MPNN + moe206 | 0.871 ± 0.010 | 0.916 ± 0.014 | 0.667 ± 0.049 | 0.832 ± 0.010 | |||
| D-MPNN | 0.776 ± 0.026 | 0.907 ± 0.031 | 0.539 ± 0.065 | 0.783 ± 0.021 | |||
| 5 | SVM | Training II | Test II | 0.839 | 0.850 | 0.745 | 0.821 |
| D-MPNN + moe206 | 0.876 ± 0.015 | 0.890 ± 0.025 | 0.676 ± 0.039 | 0.830 ± 0.010 | |||
| D-MPNN | 0.806 ± 0.010 | 0.909 ± 0.037 | 0.553 ± 0.040 | 0.808 ± 0.018 | |||
| 6 | SVM + 72descriptors + ECFP4 | Training | Test | 0.962 | 0.670 | 0.995 | 0.984 |
| D-MPNN + moe206 | 0.968 ± 0.001 | 0.656 ± 0.033 | 0.994 ± 0.001 | 0.983 ± 0.001 | |||
| D-MPNN | 0.954 ± 0.001 | 0.627 ± 0.038 | 0.992 ± 0.001 | 0.979 ± 0.000 |
Doddareddy's dataset.
Siramshetty's dataset.
Not available, this value can NOT be found in the original literature.
Hou's training set I and test set I.
Hou's training set II and test set II.
Ogura's training and test dataset. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed.
Comparison the performance of D-MPNN + moe206 model with Karim's best model
| Model | Evaluation data | AUC-ROC | MCC | NPV | ACC | PPV | SP | SE | B-ACC | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | CardioTox | Test set-I | NA | 0.599 | 0.688 | 0.810 | 0.893 | 0.786 | 0.833 | 0.810 |
| D-MPNN + moe206 | 0.849 ± 0.042 | 0.567 ± 0.061 | 0.656 ± 0.044 | 0.800 ± 0.030 | 0.890 ± 0.023 | 0.786 ± 0.051 | 0.807 ± 0.037 | 0.796 ± 0.031 | ||
| 2 | CardioTox | Test set-II | NA | 0.452 | 0.947 | 0.755 | 0.455 | 0.600 | 0.909 | 0.754 |
| D-MPNN + moe206 | 0.810 ± 0.055 | 0.470 ± 0.053 | 0.950 ± 0.032 | 0.698 ± 0.022 | 0.467 ± 0.020 | 0.620 ± 0.030 | 0.909 ± 0.064 | 0.764 ± 0.030 | ||
| 3 | CardioTox | Test set-III | NA | 0.220 | 0.986 | 0.746 | 0.113 | 0.698 | 0.794 | 0.746 |
| D-MPNN + moe206 | 0.830 ± 0.010 | 0.214 ± 0.016 | 0.986 ± 0.037 | 0.696 ± 0.030 | 0.110 ± 0.006 | 0.692 ± 0.033 | 0.788 ± 0.064 | 0.740 ± 0.020 |
Karim's dataset.
Not available. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed.
Fig. 3(A) Comparative prediction performance of XGBoost with D-MPNN under random split through five-fold cross-validation on Cai's dataset. (B) Relative importance and the SHAP values of the 20 highest ranked molecular descriptors of XGBoost with moe206. Each molecule represents a point to form a descriptor line. A molecule with a high (red) SHAP value will increase the probability of being predicted by the model as a blocker.