| Literature DB >> 30999595 |
Shuaibing He1,2,3,4, Tianyuan Ye5,6,7,8, Ruiying Wang9,10,11,12, Chenyang Zhang13,14,15,16, Xuelian Zhang17,18,19,20, Guibo Sun21,22,23,24, Xiaobo Sun25,26,27,28.
Abstract
As one of the leading causes of drug failure in clinical trials, drug-induced liver injury (DILI) seriously impeded the development of new drugs. Assessing the DILI risk of drug candidates in advance has been considered as an effective strategy to decrease the rate of attrition in drug discovery. Recently, there have been continuous attempts in the prediction of DILI. However, it indeed remains a huge challenge to predict DILI successfully. There is an urgent need to develop a quantitative structure-activity relationship (QSAR) model for predicting DILI with satisfactory performance. In this work, we reported a high-quality QSAR model for predicting the DILI risk of xenobiotics by incorporating the use of eight effective classifiers and molecular descriptors provided by Marvin. In model development, a large-scale and diverse dataset consisting of 1254 compounds for DILI was built through a comprehensive literature retrieval. The optimal model was attained by an ensemble method, averaging the probabilities from eight classifiers, with accuracy (ACC) of 0.783, sensitivity (SE) of 0.818, specificity (SP) of 0.748, and area under the receiver operating characteristic curve (AUC) of 0.859. For further validation, three external test sets and a large negative dataset were utilized. Consequently, both the internal and external validation indicated that our model outperformed prior studies significantly. Data provided by the current study will also be a valuable source for modeling/data mining in the future.Entities:
Keywords: DILI; hepatotoxicity; in silico; machine learning; molecular descriptors
Mesh:
Substances:
Year: 2019 PMID: 30999595 PMCID: PMC6515336 DOI: 10.3390/ijms20081897
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1A contour graph plotted by Tanimoto similarity index to show the molecule similarity. The Tanimoto similarity index was calculated by FP2 fingerprint.
Figure 2The distribution of training set and external test sets in the chemical space which was defined by molecular weight as X-axis and ClogP as Y-axis.
Performances of nine models developed based on different machine learning algorithms, in terms of five different indices.
| Classifier | Training Set | ||||
|---|---|---|---|---|---|
| SE | SP | ACC | BACC | AUC | |
| NaiveBayes | 0.857 | 0.401 | 0.632 | 0.629 | 0.662 |
| KNN | 0.792 | 0.761 | 0.777 | 0.777 | 0.780 |
| KStar | 0.737 | 0.735 | 0.736 | 0.736 | 0.824 |
| AdaBoostM1 | 0.774 | 0.723 | 0.749 | 0.749 | 0.818 |
| Bagging | 0.764 | 0.754 | 0.759 | 0.759 | 0.820 |
| J48 | 0.662 | 0.672 | 0.667 | 0.667 | 0.682 |
| Randomforest | 0.785 | 0.736 | 0.761 | 0.761 | 0.852 |
| Dl4j | 0.608 | 0.592 | 0.600 | 0.600 | 0.648 |
| Vote | 0.818 | 0.748 | 0.783 | 0.783 | 0.859 |
Figure 3ROC curves for the training set.
Comparison between our model and prior studies.
| Dataset | References | Size of Dataset | SE | SP | ACC | BACC |
|---|---|---|---|---|---|---|
|
| Present study | 1254 (636+/618−) | 0.818 | 0.748 | 0.783 | 0.783 |
| [ | 1241 (683+/558−) | 0.799 | 0.603 | 0.711 | 0.701 | |
| [ | 978 (571+/407−) | 0.948 | 0.585 | 0.797 | 0.767 | |
| [ | 996 (541+/455−) | 0.680 | 0.610 | 0.650 | 0.645 | |
|
| [ | 83 (66+/17−) | 0.879 | 0.647 | 0.831 | 0.763 |
| (0.909) | (0.529) | (0.831) | (0.719) | |||
| [ | 85 (58+/27−) | 0.707 | 0.815 | 0.741 | 0.761 | |
| (0.848) | (0.345) | (0.682) | (0.597) | |||
| [ | 67 (28+/39−) | 0.786 | 0.590 | 0.672 | 0.688 | |
| (0.536) | (0.641) | (0.597) | (0.588) | |||
| Present study | 204 (125+/79−) | 0.773 | 0.658 | 0.730 | 0.716 | |
| Present study | 312 (0+/312−) | - | - | 0.689 | - | |
| (0.301) |
In column of size of dataset, “+” and “−” denote the number of DILI-positives and DILI-negatives, respectively. For test, indicators within and outside parentheses were provided by prior studies and our model, respectively. We also investigated the performance of our model against the entire external test set which consisted of the external test sets provided by Ai et al., Zhang et al., and Kotsampasakou et al.
Datasets of hepatotoxicity from prior studies.
| ID | Source Name | Type of Data | No. of Compound (Positive/Negative) | DILI Categories |
|---|---|---|---|---|
| 1 | (Xu et al., 2008) [ | Clinical data for hepatotoxicity | 344 (200/144) | Authors definition |
| 2 | (Low et al., 2011) [ | Animal experiment | 127 (53/74) | Authors definition |
| 3 | (O’Brien et al., 2006) [ | In vitro cell-based assay | 83 (42/41) | Severely hepatotoxic drugs and nontoxic drugs were considered as positives and negatives, respectively |
| 4 | (Rodgers et al., 2010) [ | Clinical data for hepatotoxicity | 393 (75/318) | Actives were defined as positives, and inactives were considered as negatives |
| 5 | (Greene et al., 2010) [ | Literature reviews and medical monographs | 425 (273/152) | HH and NE represented positives and negatives, respectively |
| 6 | (Ekins et al., 2010) [ | Clinical data for hepatotoxicity | 532 (311/221) | Authors definition |
| 7 | (Liew et al., 2011) [ | Medical monographs | 1274 (759/515) | Authors definition |
| 8 | (Liu et al., 2011) [ | Drug labeling and clinical case reports | 1294 (724/570) | Authors definition |
| 9 | (Chen et al., 2013) [ | FDA-approved drug labeling | 387 (176/211) | Authors definition |
| 10 | (Zhu and Kruhlak, 2014) [ | Postmarket safety data | 282 (177/105) | Authors definition |
| 11 | (Huang et al., 2015) [ | Scientific literature | 91 (83/8) | Authors definition |
| 12 | DILIrank [ | Drug labeling and clinical data | 504 (192/312) | Only vMost-DILI-Concern were considered as positives, and vNo-DILI-Concern were considered as negatives |
| 13 | Livertox [ | Scientific literature and public databases | 343 (119/224) | Category A and Category B were combined into positives, and Category E was considered as negatives |
| 14 | LTKB [ | FDA-approved drug labeling | 195 (113/82) | Only vMost-DILI-Concern were considered as positives, and vNo-DILI-Concern were considered as negatives |
Search terms for compounds with potential hepatotoxicity or hepatoprotection.
| Search Terms 1 | Search Terms 2 |
|---|---|
| Herbal | Hepatotoxicity |
| Medicinal plant | Liver Toxicity |
| Traditional Chinese medicine | Liver failure |
| Liver injury | |
| Liver damage | |
| Hepatitis | |
| Liver cancer | |
| Liver Tumors | |
| Hepatocellular carcinoma | |
| Liver cirrhosis | |
| Hepatomegaly | |
| Liver neoplasms | |
| Fatty liver | |
| Jaundice | |
| Cholestasis | |
| Hepatoma | |
| Liver fibrosis | |
| Liver protection | |
| Hepatoprotective |
Figure 4Venn diagram of compounds from prior studies and the present study. “+” and “−” denote the number of DILI-positives and DILI-negatives, respectively.
Figure 5(A,B) Represents the contour graphs of the intercorrelation matrix of molecular descriptors before and after feature selection, respectively.
Figure 6Diagram of data processing and model construction. KS algorithm: Kennard–Stone algorithm. “+” and “−” denote the number of DILI-positives and DILI-negatives, respectively.