| Literature DB >> 31138104 |
Hyang-Mi Lee1, Myeong-Sang Yu1, Sayada Reemsha Kazmi1, Seong Yun Oh1, Ki-Hyeong Rhee2, Myung-Ae Bae3, Byung Ho Lee4, Dae-Seop Shin3, Kwang-Seok Oh4, Hyithaek Ceong5, Donghyun Lee6, Dokyun Na7.
Abstract
BACKGROUND: Drug candidates often cause an unwanted blockage of the potassium ion channel of the human ether-a-go-go-related gene (hERG). The blockage leads to long QT syndrome (LQTS), which is a severe life-threatening cardiac side effect. Therefore, a virtual screening method to predict drug-induced hERG-related cardiotoxicity could facilitate drug discovery by filtering out toxic drug candidates. RESULT: In this study, we generated a reliable hERG-related cardiotoxicity dataset composed of 2130 compounds, which were carried out under constant conditions. Based on our dataset, we developed a computational hERG-related cardiotoxicity prediction model. The neural network model achieved an area under the receiver operating characteristic curve (AUC) of 0.764, with an accuracy of 90.1%, a Matthews correlation coefficient (MCC) of 0.368, a sensitivity of 0.321, and a specificity of 0.967, when ten-fold cross-validation was performed. The model was further evaluated using ten drug compounds tested on guinea pigs and showed an accuracy of 80.0%, an MCC of 0.655, a sensitivity of 0.600, and a specificity of 1.000, which were better than the performances of existing hERG-toxicity prediction models.Entities:
Keywords: Drug discovery; In silico model; Machine learning; hERG-related cardiotoxicity
Mesh:
Substances:
Year: 2019 PMID: 31138104 PMCID: PMC6538553 DOI: 10.1186/s12859-019-2814-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Top 20 features with a high correlation
| Descriptor | Coeff. | Description |
|---|---|---|
| nRNR2 | 0.229 | Number of tertiary amines (aliphatic) |
| Wap | 0.215 | All-path Wiener index |
| F02[C-C] | 0.212 | Frequency of C - C at topological distance 2 |
| F03[C-C] | 0.212 | Frequency of C - C at topological distance 3 |
| nC | 0.211 | Number of carbon atoms |
| F04[C-C] | 0.210 | Frequency of C - C at topological distance 4 |
| D/Dtr06 | 0.208 | Distance/detour ring index of order 6 |
| ATSC5v | 0.207 | Centred Broto–Moreau autocorrelation of lag 5 (weighted by van der Waals volume) |
| F01[C-C] | 0.205 | Frequency of C - C at topological distance 1 |
| SpDiam_Dt | 0.205 | Spectral diameter from detour matrix |
| SpAD_Dt | 0.204 | Spectral absolute deviation from detour matrix |
| SpPos_Dt | 0.204 | Spectral positive sum from detour matrix |
| N-068 | 0.203 | Atom-centered fragment: Al3-N |
| Wi_Dt | 0.203 | Wiener-like index from detour matrix (detour index) |
| SpMax_Dt | 0.203 | Leading eigenvalue from detour matrix |
| TI1_L | 0.203 | First Mohar index from Laplace matrix |
| H_Dz(p) | 0.202 | Harary-like index from Barysz matrix (weighted by atomic number) |
| IDET | 0.202 | Total information content on the distance equality |
| F10[C-C] | 0.202 | Frequency of C - C at topological distance 10 |
| nR06 | 0.201 | Number of six-membered rings |
Fig. 1AUC with respect to feature number: AUC values of the six models were measured by a ten-fold cross-validation with respect to feature number
Performance (AUC) results of six machine learning methods
| Algorithm | Optimal number of features | AUC |
|---|---|---|
| Linear regression | 40 | 0.747 |
| Logistic regression | 350 | 0.764 |
| Ridge regression | 400 | 0.774 |
| Neural network | 1400 | 0.764 |
| Naïve Bayes | 40 | 0.687 |
| Random forest | 120 | 0.709 |
Performance results of the top three models with optimized thresholds
| Algorithm | Threshold | Accuracy | MCC | Sensitivity | Specificity | PPVa |
|---|---|---|---|---|---|---|
| Logistic regression | 0.57 | 0.814 | 0.307 | 0.557 | 0.844 | 0.292 |
| Neural network | 0.82 | 0.901 | 0.368 | 0.321 | 0.967 | 0.542 |
| Ridge regression | 0.64 | 0.864 | 0.332 | 0.448 | 0.912 | 0.371 |
aPPV: Positive predictive value is defined as the number of true positives/(the number of true positives + the number of false positives)
Prediction results of ten drug compounds
| Name | In vivo result | Prediction | |||
|---|---|---|---|---|---|
| Our model | Pred-hERG binary | Pred-hERG multiclass | OCHEM Predictora | ||
| Haloperidol | Toxic | Toxic | Toxic | Nontoxic | Nontoxic |
| Cimetidine | Nontoxic | Nontoxic | Toxic | Nontoxic | Nontoxic |
| Disopyramide | Toxic | Toxic | Nontoxic | Nontoxic | Nontoxic |
| Quinnidine | Toxic | Nontoxic | Toxic | Nontoxic | Toxic |
| Terazosin | Nontoxic | Nontoxic | Toxic | Nontoxic | Nontoxic |
| Spironolactone | Nontoxic | Nontoxic | Toxic | Nontoxic | Nontoxic |
| Sotalol | Toxic | Nontoxic | Nontoxic | Nontoxic | Nontoxic |
| Cefazoline | Nontoxic | Nontoxic | Toxic | Nontoxic | Nontoxic |
| Chloropromazine | Toxic | Toxic | Toxic | Toxic | Nontoxic |
| Loratadine | Nontoxic | Nontoxic | Toxic | Nontoxic | Nontoxic |
aConsensus II in the predictor was used
Performance comparison on the in vivo test dataset
| Models | Accuracy | MCC | Sensitivity | Specificity |
|---|---|---|---|---|
| Our model | 0.800 | 0.655 | 0.600 | 1.000 |
| Pred-hERG binary | 0.300 | −0.500 | 0.600 | 0.000 |
| Pred-hERG multiclass | 0.600 | 0.333 | 0.200 | 1.000 |
| OCHEM Predictor | 0.600 | 0.333 | 0.200 | 1.000 |
Performance comparison on the in vitro dataset
| Models | Accuracy | MCC | Sensitivity | Specificity |
|---|---|---|---|---|
| Our model | 0.901 | 0.368 | 0.321 | 0.967 |
| Pred-hERG binary | 0.15 | −0.034 | 0.912 | 0.061 |
| Pred-hERG multiclass | 0.902 | 0.218 | 0.075 | 0.999 |
| OCHEM Predictor | 0.885 | 0.133 | 0.099 | 0.978 |