| Literature DB >> 32204453 |
Robert Ancuceanu1, Marilena Viorica Hovanet1, Adriana Iuliana Anghel1, Florentina Furtunescu2, Monica Neagu3,4,5, Carolina Constantin3,4, Mihaela Dinu1.
Abstract
Drug-induced liver injury (DILI) remains one of the challenges in the safety profile of both authorized and candidate drugs, and predicting hepatotoxicity from the chemical structure of a substance remains a task worth pursuing. Such an approach is coherent with the current tendency for replacing non-clinical tests with in vitro or in silico alternatives. In 2016, a group of researchers from the FDA published an improved annotated list of drugs with respect to their DILI risk, constituting "the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans" (DILIrank). This paper is one of the few attempting to predict liver toxicity using the DILIrank dataset. Molecular descriptors were computed with the Dragon 7.0 software, and a variety of feature selection and machine learning algorithms were implemented in the R computing environment. Nested (double) cross-validation was used to externally validate the models selected. A total of 78 models with reasonable performance were selected and stacked through several approaches, including the building of multiple meta-models. The performance of the stacked models was slightly superior to other models published. The models were applied in a virtual screening exercise on over 100,000 compounds from the ZINC database and about 20% of them were predicted to be non-hepatotoxic.Entities:
Keywords: DILI; DILIrank; QSAR; drug hepatotoxicity; in silico; nested cross-validation; virtual screening
Mesh:
Year: 2020 PMID: 32204453 PMCID: PMC7139829 DOI: 10.3390/ijms21062114
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Variability of the dataset illustrated by several simple constitutional descriptors or molecular properties. Blue: compounds of no concern; red: compounds of hepatotoxicity concern. For the Lipinski rule of five, “No” indicates the compounds with no violation of the rule, and “Yes” those violating the rule.
Figure 2Dissimilarity matrix (based on Gower distance) offering a synthetic image of the chemical diversity in the dataset.
Figure 3Performance of 165 Quantitative Structure–Activity Relationship (QSAR) models in terms of sensitivity.
Figure 4Performance of 165 QSAR models in terms of specificity.
Figure 5Performance of the 165 QSAR models in terms of positive predictive value.
Figure 6Performance of 165 QSAR models in terms of balanced accuracy.
The most important molecular descriptors associated with drug-induced liver injury (DILI) by the 17 feature selection algorithms used.
| Descriptor | Interpretation | Descriptor Block (group) | Frequency Occurring Among the First 5 Most Important Features | Sense of the Contribution * |
|---|---|---|---|---|
| Mp | mean atomic polarizability (scaled on Carbon atom) | Constitutional indices | 12 (70.59%) | + |
| H% | percentage of H atoms | Constitutional indices | 12 (70.59%) | − |
| GATS1m | Geary autocorrelation of lag 1 weighted by mass | 2D autocorrelations | 12 (70.59%) | − |
| SpPosA_B(m) | normalized spectral positive sum from Burden matrix weighted by mass | 2D matrix-based descriptors | 10 (58.82%) | + |
| MLOGP | Moriguchi octanol-water partition coeff. (logP) | Molecular properties | 4 (23.53%) | + |
| PCR | ratio of multiple path count over path count | Walk and path counts | 3 (17.65%) | + |
| totalcharge | total charge | Constitutional indices | 2 (11.76%) | − |
| SM1_Dz.m. | spectral moment of order 1 from Barysz matrix weighted by mass | 2D matrix-based descriptors | 2 (11.76%) | + |
| SIC1 | Structural Information Content index (neighborhood symmetry of 1-order) | Information indices | 2 (11.76%) | + |
* higher values associate with hepatotoxicity (+); higher values associate with lack of hepatotoxicity (−).