| Literature DB >> 35631529 |
Binyou Wang1, Xiaoqiu Tan2,3,4, Jianmin Guo2, Ting Xiao2, Yan Jiao5, Junlin Zhao1, Jianming Wu6,7, Yiwei Wang2,3,4.
Abstract
Drug-induced immune thrombocytopenia (DITP) often occurs in patients receiving many drug treatments simultaneously. However, clinicians usually fail to accurately distinguish which drugs can be plausible culprits. Despite significant advances in laboratory-based DITP testing, in vitro experimental assays have been expensive and, in certain cases, cannot provide a timely diagnosis to patients. To address these shortcomings, this paper proposes an efficient machine learning-based method for DITP toxicity prediction. A small dataset consisting of 225 molecules was constructed. The molecules were represented by six fingerprints, three descriptors, and their combinations. Seven classical machine learning-based models were examined to determine an optimal model. The results show that the RDMD + PubChem-k-NN model provides the best prediction performance among all the models, achieving an area under the curve of 76.9% and overall accuracy of 75.6% on the external validation set. The application domain (AD) analysis demonstrates the prediction reliability of the RDMD + PubChem-k-NN model. Five structural fragments related to the DITP toxicity are identified through information gain (IG) method along with fragment frequency analysis. Overall, as far as known, it is the first machine learning-based classification model for recognizing chemicals with DITP toxicity and can be used as an efficient tool in drug design and clinical therapy.Entities:
Keywords: drug-induced immune thrombocytopenia; k-nearest neighbor; machine learning; structural alert
Year: 2022 PMID: 35631529 PMCID: PMC9143325 DOI: 10.3390/pharmaceutics14050943
Source DB: PubMed Journal: Pharmaceutics ISSN: 1999-4923 Impact factor: 6.525
Number of compounds in the training and external validation sets.
| Training Set | External Validation Set | Sum | |
|---|---|---|---|
| Toxicants | 75 | 18 | 93 |
| Non-toxicants | 105 | 27 | 132 |
| Total | 180 | 45 | 225 |
Figure 1Chemical space distribution of compounds in the training and external validation sets.
Figure 2Distributions of six molecular properties of DITP toxicants and DITP non-toxicants.
Figure 3(a) Heat map of the Tanimoto similarity index on the training set. (b) Heat map of the Tanimoto similarity index on the external validation set.
The five-fold cross-validation results of the top-five classification models.
| Model | Molecular Features | SE (%) | SP (%) | ACC (%) | MCC | AUC |
|---|---|---|---|---|---|---|
| k-NN | RDMD + PubChem | 69.0 ± 2.3 | 56.6 ± 2.1 | 62.7 ± 1.0 | 0.261 ± 0.022 | 0.628 ± 0.011 |
| k-NN | 10MD + PubChem | 66.9 ± 2.0 | 57.2 ± 2.4 | 61.8 ± 1.7 | 0.243 ± 0.031 | 0.621 ± 0.016 |
| XGBoost | CCMD + KPFP | 61.9 ± 2.7 | 61.4 ± 3.3 | 61.1 ± 1.2 | 0.233 ± 0.025 | 0.617 ± 0.013 |
| XGBoost | CCMD + MACCS | 64.2 ± 2.7 | 58.3 ± 2.1 | 60.7 ± 0.8 | 0.226 ± 0.023 | 0.613 ± 0.011 |
| XGBoost | CCMD + PubChem | 61.0 ± 1.1 | 61.5 ± 2.1 | 60.8 ± 0.9 | 0.226 ± 0.021 | 0.612 ± 0.010 |
The five-fold cross-validation results of the optimal classification models based on only molecular fingerprints or descriptors.
| Model | SE (%) | SP (%) | ACC (%) | MCC | AUC |
|---|---|---|---|---|---|
| k-NN-10MD | 66.1 ± 1.6 | 56.2 ± 3.0 | 61.0 ± 1.9 | 0.224 ± 0.035 | 0.612 ± 0.018 |
| k-NN-RDMD | 47.6 ± 2.6 | 67.0 ±2.8 | 56.9 ± 1.5 | 0.150 ± 0.034 | 0.573 ± 0.016 |
| k-NN-PubChem | 66.9 ± 2.5 | 56.0 ± 2.0 | 61.2 ±1.3 | 0.231 ± 0.027 | 0.614 ± 0.014 |
| XGBoost-CCMD | 60.5 ± 3.2 | 60.1 ± 2.8 | 60.2 ± 2.0 | 0.207 ± 0.039 | 0.603 ± 0.019 |
| XGBoost-PubChem | 57.5 ± 3.7 | 62.1 ± 5.5 | 59.4 ± 1.7 | 0.196 ± 0.031 | 0.598 ±0.016 |
| XGBoost-KPFP | 57.1 ± 2.5 | 60.8 ± 3.0 | 58.1 ± 2.2 | 0.180 ± 0.046 | 0.589 ± 0.023 |
| XGBoost-MACCS | 53.9 ± 2.5 | 62.1 ± 2.7 | 57.8 ± 1.2 | 0.162 ± 0.031 | 0.580 ± 0.015 |
Performances of the top-five classification models and consensus model on the external validation set.
| Model | Molecular Features | SE (%) | SP (%) | ACC (%) | MCC | AUC |
|---|---|---|---|---|---|---|
| k-NN | RDMD + PubChem | 83.3 | 70.4 | 75.6 | 0.526 | 0.769 |
| XGBoost | CCMD + KPFP | 66.7 | 85.2 | 77.8 | 0.531 | 0.759 |
| k-NN | 10MD + PubChem | 83.3 | 63.0 | 71.1 | 0.456 | 0.731 |
| XGBoost | CCMD + PubChem | 61.1 | 85.2 | 75.6 | 0.481 | 0.731 |
| XGBoost | CCMD + MACCS | 61.1 | 81.5 | 73.3 | 0.436 | 0.713 |
| Consensus model | / | 61.1 | 85.2 | 75.6 | 0.481 | 0.731 |
Number of drugs inside and outside of the AD.
| Inside | Outside | AD Coverage (%) | |||
|---|---|---|---|---|---|
| P | N | P | N | ||
| Training set | 75 | 105 | 0 | 0 | 100 |
| External validation set | 18 | 27 | 4 | 0 | 91.1 |
Figure 4(a) The structure of the DITP toxicants with the β-lactam ring correctly identified by the CCMD + KPFP-XGBoost and RDMD + PubChem-k-NN models. (b) Structure of DITP toxicants misclassified by the CCMD + KPFP-XGBoost model.
Figure 5Structure of two false positives and two DITP toxicants.
Figure 6The information gain value distributions of the KRFP fragments.
Five structural alerts of DITP toxicity and their representative structures.
| Structure | IG | Freq _P | Freq _N | Representative Structure |
|---|---|---|---|---|
|
| 0.0170 | 2.12 | 0.21 |
|
|
| 0.0259 | 1.84 | 0.41 |
|
|
| 0.0123 | 1.73 | 0.49 |
|
|
| 0.0138 | 1.66 | 0.54 |
|
|
| 0.0073 | 1.61 | 0.57 |
|
X+ represents nitrogen positive atoms, sulfur positive ions, and metal ions.