Literature DB >> 28652195

Predicting drug-induced liver injury: The importance of data curation.

Eleni Kotsampasakou¹, Floriane Montanari¹, Gerhard F Ecker².

Abstract

Drug-induced liver injury (DILI) is a major issue for both patients and pharmaceutical industry due to insufficient means of prevention/prediction. In the current work we present a 2-class classification model for DILI, generated with Random Forest and 2D molecular descriptors on a dataset of 966 compounds. In addition, predicted transporter inhibition profiles were also included into the models. The initially compiled dataset of 1773 compounds was reduced via a 2-step approach to 966 compounds, resulting in a significant increase (p-value<0.05) in model performance. The models have been validated via 10-fold cross-validation and against three external test sets of 921, 341 and 96 compounds, respectively. The final model showed an accuracy of 64% (AUC 68%) for 10-fold cross-validation (average of 50 iterations) and comparable values for two test sets (AUC 59%, 71% and 66%, respectively). In the study we also examined whether the predictions of our in-house transporter inhibition models for BSEP, BCRP, P-glycoprotein, and OATP1B1 and 1B3 contributed in improvement of the DILI mode. Finally, the model was implemented with open-source 2D RDKit descriptors in order to be provided to the community as a Python script.

Entities: Chemical

Keywords: 2-class classification; Data curation; Drug-induced liver injury; Liver transporters; Random Forest; Toxicity reports

Mesh：

Substances：

Year: 2017 PMID： 28652195 PMCID： PMC6422282 DOI： 10.1016/j.tox.2017.06.003

Source DB: PubMed Journal: Toxicology ISSN： 0300-483X Impact factor: 4.221

Introduction

Drug-induced liver injury (DILI) is the term used for liver damage that is caused by drugs, herbal agents or nutritional supplements (Ghabril et al., 2010; Watkins and Seeff 2006). DILI has gained increasing attention in recent years (Raschi and De Ponti, 2015), as it is one of the main causes for attrition during clinical and pre-clinical studies and the main reason for drug withdrawal from the market or for labeling with a black box warning (Ballet 1997; Chen et al., 2011; O’Brien et al., 2006; Regev 2014). Thus, great effort has been invested towards elucidating the toxicological processes and mechanisms that result in manifestations of DILI (Vinken, 2015). It is widely accepted that, together with metabolizing enzymes, liver transporters play an important role for maintaining the integrity and proper function of the liver, and also influence the ADMET (absorption, distribution, metabolism, excretion and toxicity) profile of drugs (Faber et al., 2003; Shitara et al., 2013). Actually, there are several recent publications suggesting that inhibition of liver transporters might result in manifestations of DILI. For cholestasis in particular, strong evidence towards the role of the bile salt export pump (BSEP) (Aleo et al., 2014; Dawson et al., 2011; Padda et al., 2011; Qiu et al., 2016; Vinken 2015; Vinken et al., 2013; Welch et al., 2015) has been posed. There is also evidence for the multidrug resistance-associated protein 2 (MRP2) (Padda et al., 2011; Pauli-Magnus and Meier 2006), breast cancer resistance protein (BCRP) (Padda et al., 2011; Pauli-Magnus and Meier 2006), P-glycoprotein (Padda et al., 2011; Pauli-Magnus and Meier 2006) and multidrug resistance-associated protein 3 and 4 (MRP3 and MRP4) (Padda et al., 2011; Pauli-Magnus and Meier 2006; Welch et al., 2015) to be involved. For hyperbilirubinemia, another possible manifestation of hepatotoxicity, involvement of organic anion transporting polypeptides 1B1 and 1B3 (OATP1B1 and OATP1B3) (Chang et al., 2013; Sticova and Jirsa 2013), MRP2 (Sticova and Jirsa, 2013) and to a smaller extent BCRP (Sticova and Jirsa, 2013) is discussed. Although in vitro predictive methods are efficient for many toxic endpoints, they are time-consuming and expensive (Bowes et al., 2012; Whitebread et al., 2005). In addition, for assessing hepatotoxicity, experimental methods such as in vitro tests and animal models, have been shown to share low concordance (< 50%) with human hepatotoxicity (Chen et al., 2011; Liu et al., 2011; Olson et al., 2000). This led to the development of predictive computational methods, which are summarized in two recent reviews by (Chen et al., 2014) and (Ekins, 2014). Although all these models generally perform quite well, they sometimes suffer from low statistical performance, imbalanced sensitivity vs specificity, or small data sets (Table 1).

Table 1

Classification models for DILI reported in literature. Acc stands for accuracy, Sen for sensitivity, Spec for specificity, BA for balanced accuracy, CV for cross validation, EV for external validation and IV for internal validation.

Reference	Descriptors	Classification algorithm	Data used	Reported performance
Cheng and Dixon (2003)	2D molecular descriptor	Ensemble recursive partitioning	382 drugs for CV	CV: 76% Acc; 76% Sen; 75% Spec
			54 drugs for EV	EV: 81% Acc; 70% Sen; 90% Spec
Cruz-Monteagudo et al. (2008)	Radial distribution function	Linear discriminant analysis	74 drugs for CV	CV: 84% Acc; 78% Sen; 90% Spec
	molecular descriptors		13 drugs for EV	EV: 82% Acc
Matthews et al. (2009)	Molecular descriptors	4 commercial QSAR programs	~1600 drugs for CV	CV: 39% Sen; 87% Spec
			18 drugs for EV	EV: 89% Sen
Rodgers et al. (2010)	topologicalindices of molecular structures (MolConnZ) and Dragon molecular descriptors	k-nearest neighbor	37 drugs for EV	84% Acc; 74% Sen; 94% Spec
Fourches et al. (2010)	2D fragments and Dragon	Support vector machine	531 drugs for CV 18 compounds for EV	CV: 62–68% Accs
	molecular descriptors			EV: 78% Acc
Ekins et al. (2010)	extended connectivity functional	Linear discriminant analysis	295 compound for CV	CV: 59% ACC; 53% Sen; 65% Spec
	class fingerprints of maximum diameter 6 (ECFC_6)		237 compounds for EV	EV: 60% Acc; 56% Sen; 67% Spec
Liew et al. (2011)	PaDEL molecular descriptor	Ensemble of mixed learning	1087 compounds for CV	CV: 68% Accs; 67% Sen; 70% Spec
			120 compounds for EV	EV: 75% Acc; 82% Sen; 65% Spec
Liu et al. (2011)	functional class	Bayesian models	888 drugs for training3 data sets with 40–148 drugs for EV	EV: 60–70% Accs
	fingerprints (FCFP_6)
Chen et al. (2013)	Mold2 chemical descriptor	Decision Forest	197 drugs for CV	CV: 70% Acc
			Three data sets with190–348 drugs for EV	EV: 62–69% Accs
Liu et al. (2015a)	physicochemical descriptors and fingerprints	Ensemble classifier	677 compounds for CV	81% BA; 66% Sen; 95% Spec
Muller et al. (2015)	physicochemical descriptors and fingerprints	Ensemble classifier	677 compounds for CV	81% BA; 66% Sen; 95% Spec
Muller et al. (2015)	ISIDA fragment descriptors	SVM	424 drugs for CV	66% BA
Xu et al. (2015)	Encoding layers based on SMILES, PaDEL descriptors	Deep Learning	190, 475 & 1065 compounds for CV	CV: 70–88% Accs; 70–90% Sens; 70–87% Specs
			185,320, 236,198 & 119 compounds for EV	EV: 62–87% Accs; 62–83% Sens; 62–93% Specs
Mulliner et al. (2016)	2D and 3D physicochemical descriptors	SVM with a genetic algorithm	3712 compounds for training	IV: 75% Acc; 73% AUC
			221 compounds for IV
			269 compounds for EV
Zhang et al. (2016)	FP4 fingerprints	SVM	1317 compounds for training	Training set: 66% Acc; 85% Sen; 34% Spec; 55% AUC
			88 compounds for EV	EV: 75% Acc; 93% Sen; 38% Spec; 61% AUC

In this study we generate in silico classification models for DILI by compiling multiple and diverse datasets from literature. We carefully curated these data regarding the chemotypes, as well as the accuracy of the class label. In addition, we are exploring the importance of hepatic transporter inhibition on DILI by using the predictions of a set of in-house in silico classification models as additional descriptors for the DILI model.

Methods

Data compilation

Training set

Searching PubMed, 2017 (http://www.ncbi.nlm.nih.gov/pubmed), Google, 2017 (https://www.google.at) and Scopus, 2017 (https://www.scopus.com/) using the terms: “drug-induced liver injury”, “DILI”, “drug-induced hepatotoxicity” identified 9 unique datasets for human DILI/hepatotoxicity (Table 2).

Table 2

Description of the sources upon which the training set was built. In number of compounds, “+” denotes the number of DILI-positive compounds and “−” the number of negative compounds. These numbers correspond to the number of compounds remaining after data curation in a source by source basis.

Source name	Type of data	Number of compounds	Label choice
O’Brien et al. (2006)	In vitro cell-based assay	132 (100+/32−)	“severely” and “moderately” toxic are considered positives.
Rodgers et al. (2010)	FDA reports database	382 (75+/307−)	Authors classification
Fourches et al. (2010)	Text mining	902 (620+/282−)	Authors classification
Greene et al. (2010)	Compilation of published data	385 (252+/133−)	Authors classification
Ekins et al. (2010)	Clinical data for hepatotoxicity	499 (294+/205−)	Authors classification
Chen et al. (2011)	FDA-approved labels	279 (218+/61−)	“most DILI concern” and “less DILI concern” are considered positives
Liu et al. (2011)	SIDER_2 database	835 (188+/647−)	Authors classification
Zhu and Kruhlak (2014)	Post-marketing safety data	1948 (651+/1297−)	Authors classification, keeping only highest class certainty
Liu et al. (2015b)	LiverTox database	583 (409+/174−)	“hepatotoxic” and “possible hepatotoxic” are considered positives

For visualizing the data structures and for converting the names into structures Marvin from ChemAxon, 2013 (http://www.chemaxon.com 2013) was used.

External test sets

After compiling the training set and generating the DILI model, we came across one more human DILI dataset that had initially escaped our attention (Liew et al., 2011). Additionally, there were two more datasets published after the model development (Chen et al., 2016; Mulliner et al., 2016) (Table 3).

Table 3

Description of the sources upon which the test set was built. In number of compounds, “+” denotes the number of DILI-positive compounds and “−” the number of negative compounds. These numbers correspond to the number of compounds remaining after data curation in a source by source basis.

Source name	Type of data	Number of compounds	Label choice
Liew et al. (2011)	Micromedex reports of adverse reactions	341 (221+/120−)	Authors classification
Mulliner et al. (2016)	Compilation of public data, data from PharmaPendium and Leadscope	921 (519+/402−)	Authors classification
Chen et al. (2016)	Compilation of public data and LiverTox	96 (50+/46−)	“most DILI concern” and “less DILI concern” are considered positives, “verified no DILI concern” as negatives
Merged	The 3 external datasets were merged and the common compounds with contradictory class labels were removed	996 (541+/455−)	Maintenance of the class labels of the original external test sets

All datasets (training set, the three external test sets and the merged test set) are provided in the Supplementary material.

Chemical curation

For each dataset we applied the following chemotype curation: Check for inorganic compounds using MOE 2014.09. (MOE, 2015) and remove any occurring. Using the Standardiser tool (Atkinson, 2014) created by Francis Atkinson; all salt parts and any compounds containing metals and rare or special atoms are removed from the dataset and the structures are standardized. Duplicates and permanently charged compounds are removed using MOE 2014.09. (MOE, 2015) Here we must note that stereoisomers, even if biologically can be considered as different compounds, were considered as duplicates in our study, since they give the exactly same vector of descriptors. If two (or more) stereoisomers are of the same class, only one was kept. If they were of different classes, all were removed. 3D structures are generated using CORINA (version 3.4)(Sadowski et al., 1994) and their energy is minimized with MOE 2014.09 (MOE, 2015), using default settings, but changing the gradient to 0.05 RMS kcal/mol/A2. Existing chirality is preserved.

Class-label curation

Apart from the chemical curation of the data, we also apply careful curation regarding the class label of the compounds. In particular, after merging all individual datasets in one database, the majority of the compounds are present in more than one dataset. In case of conflicting class labels, the majority label is assigned to the compound. In case the class labels are equally distributed, the compound is considered as “ambiguous” and it is removed from the dataset. This leads to 1773 compounds, 794 positives and 979 negatives. In Chart 1 the overlap of compounds (positives and negatives) across the different amount of sources is depicted. It is notable that for the case of occurrence in all 9 sources, we have only positives for DILI, which is in accordance with the fact that negative results are less often reported.

Chart 1

Overlap of DILI positives and negatives across the different amount of sources.

However, the first modeling attempt of the dataset gave only moderate results. Re-analyzing the dataset revealed, that for several co-occurring compounds, even labeled as DILI negatives by majority vote, the Fourches source (Fourches et al., 2010) was labeling them as positives. The Fourches dataset was compiled via text mining, a sophisticated but error-prone method (Caporaso et al., 2008; Zhu et al., 2013). Therefore, in order to improve the dataset quality, all compounds that were coming solely from the Fourches dataset (227 compounds) were removed. Subsequently, all compounds coming from only a single source were removed, as they do not allow us to counter check the class label with at least one additional source. Following this concept leads to the removal of additional 584 compounds, which provides the final set of 966 compounds (500 positives and 466 negatives). The differences in model performance after the class-label curation of the datasets is presented in the Supporting information (Table S1).

Generation of statistical models

Algorithms used

The 2-class classification models were built using the software package WEKA (version 3.7.12) (Hall et al., 2009). Performance of several base classifiers, such as logistic regression, tree methods (Random Forest and J48), Support Vector Machines (SMO in WEKA with polynomial, RBF and Puk kernels), Naïve Bayes, and k-nearest neighbors, several evaluating methods for attribute selection (AttributeSelectedClassifier), as well as for improving the statistical performance such as Bagging (Breiman, 1996) and Boosting (Freund and Schaphire 1996; Friedman et al., 2000) were evaluated. All in all, Random Forest (Breiman, 2001) with 100 trees was identified as the most promising classifier.

Molecular descriptors

For both datasets, several types of molecular descriptors have been calculated: all 2D MOE descriptors (192 descriptors in total), the 3D Volsurf series of descriptors (MOE 2015), PaDEL descriptors (Yap, 2010) and extended connectivity fingerprints of diameter 6 (ECFP6) using RDKit (Landrum). In general, the 2D MOE descriptors performed best. In order to investigate the potential influence of transporter inhibition in DILI manifestation, we predicted the transporter inhibition profile of all compounds and used it as additional descriptors (Table S2). In particular, for OATP1B1 and 1B3 inhibition, we use our previously published models based on PaDEL descriptors (Kotsampasakou et al., 2015), as implemented in eTOXlab (Carrio et al., 2015). For BSEP inhibition, we useed the float predictions obtained from the model’s implementation as KNIME workflow (Montanari et al., 2016b). Also for P-glycoprotein (Schwarz et al., 2016) and for BCRP (Montanari et al., 2016a) inhibition, the respective float prediction scores were used.

DILI model with open-source descriptors

Since MOE is a commercial software package, we also provide a free version of the model using exclusively open-source libraries. For this, the final model set-up (all 2D MOE descriptors and Random Forest with 100 trees) was taken and converted in the following way: descriptors were implemented in RDKit (Landrum, 2016) (196 descriptors in total) and the Random Forest was implemented with the scikit-learn machine learning library for python (Pedregosa et al., 2011). The script for training, cross-validating and using the model is provided as Supplementary material.

Model validation

For model selection, 10-fold cross validation was used. The performance of each model was examined for accuracy, sensitivity, specificity, area under the curve (AUC) and precision. For the best models obtained, we performed 50 iterations by changing the cross-validation seed (for splitting the data within cross validation) and further performed a Welch (two-sample) t-test in R (http://www.R-project.org/) to assess whether the model performance for the different training data sets (after class label curation) is indeed significantly different. This was also done to compare whether the addition of the predicted transporter interaction profiles significantly improves model performance. The best models are further validated via external testing by using the validation datasets described above.

Applicability domain of the models

The applicability domain was checked on KNIME with the Enalos nodes (Afantitis et al., 2011; Melagraki et al., 2010) that compute the applicability domain on the basis of the Euclidean distances (Zhang et al., 2006). Additionally, we assessed to which extent the DILI datasets (both training and external test sets) were within the applicability domain of the transporters models, using the same procedure. The number of compounds within the model’s applicability domain for each model and for each DILI dataset is provided in the Supporting information (Table S3).

Results and discussion

Optimizing the training dataset – the importance of curation

Compiling the DILI dataset from the 9 data sources and performing the curation of the chemotypes and class labels according to majority vote initially lead to 1773 compounds. However, the first modeling attempts failed to yield models with acceptable performance. Analyzing the dataset revealed, that one source (Fourches) was compiled from text mining. Although text mining is a powerful approach for collecting data directly from narrative text, it is more prone to errors than manual extraction (Caporaso et al., 2008; Zhu and Kruhlak, 2014). Two examples are tocopherol and carnitine, which were reported as hepatotoxic only by the Fourches source. According to literature, those two compounds rather show a hepatoprotective effect against DILI caused by other drugs (Bohan et al., 2001; Tayal et al., 2007), than being hepatotoxic. Therefore, the compounds coming only from the Fouches dataset were completely removed. This reduction led to a new training set of 1547 compounds and improved the statistical performance of the resulting models (see Table S3 in the Supplementary material). In order to further improve the dataset quality, we also removed all compounds that appear only in one source (581 compounds). In this case, it is not possible to double-check the class label, which definitely adds noise to the data. Indeed, the model trained on this dataset shows additional improvement (Table S1). In order to evaluate if the difference between the models generated on the three datasets is statistically significant, 50 iterations of 10-fold cross validation were performed by changing the cross-validation seed followed by a two sample t-test (Table S3). As can be seen, all parameters apart from specificity generally increase with higher quality of the data sets. Especially sensitivity, which is of higher importance since we are dealing with a toxicity endpoint, presents a remarkable increase, rising from 46% to 68%. Remarkably, the analysis also indicates no difference on the model performance whether using the transporters predictions as additional information or not.

DILI 2-class classification models

For the final training dataset of 966 compounds, the best models are obtained using all 2D MOE descriptors. However, this restricts broader usage, as its application is conditional to a respective license for calculating the descriptors. In order to offer the model to the scientific community in open-source form, we rebuilt it using all 2D RDKit descriptors (196 descriptors in total; Table 4) and provide the respective python script.

Table 4

Statistical performance of the final Random Forest (100 trees) model A) using all 2D MOE descriptors and transporter predictions (DILI_MOE_transp_RF model) or B) using only the 2D MOE descriptors (DILI_MOE_RF model) and the C) open source model (DILI_RDKit _RF100).

	Accuracy	Sensitivity	Specificity	AUC	Precision
A) DILI_MOE_transp _RF100
10-fold CV (average +/− standard deviation for 50 iterations)	0.65 ± 0.01	0.68 ± 0.01	0.61 ± 0.01	0.69 ± 0.01	0.65 ± 0.01
Mulliner 921 cpds	0.57	0.63	0.50	0.59	0.62
Liew 341 cpds	0.67	0.72	0.56	0.71	0.75
Chen 96 cpds	0.59	0.54	0.65	0.61	0.63
Merged test set 966cpds	0.59	0.68	0.50	0.62	0.62
B) DILI_ MOE _RF100
10-fold CV (average +/− standard deviation for 50 iterations)	0.65 ± 0.01	0.68 ± 0.01	0.61 ± 0.01	0.69 ± 0.01	0.65 ± 0.01
Mulliner 921 cpds	0.58	0.60	0.55	0.59	0.63
Liew 341 cpds	0.68	0.68	0.67	0.71	0.79
Chen 96 cpds	0.63	0.56	0.70	0.66	0.67
Merged test set 966cpds	0.60	0.64	0.56	0.62	0.63
C) DILI_RDKit_RF100
10-fold CV (average +/− standard deviation for 50 iterations)	0.64 ± 0.01	0.70 ± 0.01	0.57 ± 0.01	0.69 ± 0.01	0.63 ± 0.01
Mulliner 921 cpds	0.60	0.64	0.54	0.62	0.64
Liew 332 cpds	0.67	0.72	0.56	0.71	0.72
Chen 95 cpds	0.64	0.64	0.64	0.73	0.64
Merged test set 966cpds	0.60	0.67	0.52	0.64	0.63

Notes: The number of compounds for the external datasets is slightly different for the predictions on model C because for some compounds (peptides), some descriptor values computed by RDKit were too large to be handled by the machine learning algorithm.

As Table 4 shows, the performance of the models is quite stable and satisfactory for cross validation. As can one see, there is no substantial difference between the model obtained via using transporters predictions as additional descriptors (model A), and the one built with only the 2D MOE descriptors (model B;), which is further confirmed by statistical testing (p-values > 0.05). Furthermore, the open source model and the model built with proprietary descriptors can be considered equivalent, despite some minor changes for 10-fold cross validation and the external validation. The model remains robust also for external validation, with statistics values quite similar to those obtained by cross validation. However, it has to be taken into account that the DILI dataset is based on toxicity reports. Thus, despite our complex workflow for curating the data, there still might be mislabeled compounds due to the drawbacks of the adverse event reporting system. Among these issues are: 1) under-reporting (Palleria et al., 2013; Rodgers et al., 2010; Zhu and Kruhlak, 2014) due to the voluntary character of the system (Chen et al., 2008; Hauben 2004; Zhu and Kruhlak, 2014), 2) difficulty in finding human toxicity data (often proprietary and post-marketing data difficult to obtain) (Rodgers et al., 2010), 3) non-requirement of causality (Zhu and Kruhlak, 2014). The latter is quite serious in the contemporary era of polypharmacology, where many people, especially the elderly, receive more than one different medication. An indication of these drawbacks is the comparison of the class labels between overlapping compounds of the training and the test sets, as well as between the test sets themselves (formation of the merged external test set), which revealed contradiction of class labels in up to 20% of the compounds

Association of transporter inhibition profiles and DILI

There is ample evidence in literature for the association of selected liver transporters and DILI. This especially concerns BSEP (Aleo et al., 2014; Dawson et al., 2011; Padda et al., 2011; Qiu et al., 2016; Vinken 2015; Vinken et al., 2013; Welch et al., 2015), BCRP (Padda et al., 2011; Pauli-Magnus and Meier 2006), P-glycoprotein (Padda et al., 2011; Pauli-Magnus and Meier 2006), and OATP1B1/1B3 (Chang et al., 2013; Sticova and Jirsa 2013). This prompted us to introduce predicted inhibition profiles of these transporters into the feature matrix used for predicting DILI. However, we observed the same model performance for the models built with or without the transporter inhibition profile (Table 5, p-values < 0.05). A possible reason for this might relate to the fact that the transporter inhibition profiles are based on predictions rather than on experimental data. Even though the transport inhibition models are reliable (AUC values in Table S2) and most of the compounds of the DILI training set belong to the respective applicability domains (Table S3), one cannot rule out the possibility of mispredictions, which in turn add noise into the feature matrix. However, when comparing the accuracy of the transporter models with the experience gained in the data curation task, the noise added by wrong predictions is not expected to be far beyond the one present in the DILI class labels. Furthermore, liver transporters have overlapping substrate and inhibitor profiles (Giacomini et al., 2010; Homolya et al., 2003; König et al., 2013; Shugarts and Benet 2009). Apart from that, the hepatic homeostasis systems have ways to compensate the inhibition of one transporter, by overexpression of another (e.g. OATP1B1/OATP1B3) (Cui et al., 2009; Kalliokoski and Niemi 2009). Thus, inhibition of solely one transporter might not have a great impact in the proper function of the hepatocyte. This is also reflected in the data: the training compounds that are predicted as inhibitors for up to three transporters are not particularly enriched with DILI positives (451 DILI-positives and 443 DILI-negatives), while compounds predicted to inhibit at least four transporters are more likely to be DILI-positives (49 DILI-positives and 23 DILI-negatives, p-value < 0.01). Furthermore, liver transporters other than those included in this study may additionally play a role in DILI: the multidrug resistance-associated protein 2 (MRP2) (Nicolaou et al., 2012; Padda et al., 2011; Pauli-Magnus and Meier 2006), the multidrug resistance protein 3 (MDR3) (Chan and Vandeberg, 2012; Pauli-Magnus and Meier, 2006) and MRP3 and MRP4 (Padda et al., 2011; Pauli-Magnus and Meier 2006; Welch et al., 2015). Unfortunately, due to the lack of experimental data, it was not possible to develop and validate in silico models for these transporters in order to include them in the study. Finally, it might be that the complexity of the DILI endpoint itself does not allow a strong association between liver transporter inhibition with DILI. Indeed, several other mechanisms produce hepatotoxicity (Vinken, 2015): formation of reactive metabolites by cytochrome P450 (Corsini and Bortolini 2013; Schadt et al., 2015; Utkarsh et al., 2015), formation of glutathione adducts (Schadt et al., 2015) and mitochondrial toxicity (Aleo et al., 2014; Schadt et al., 2015) are examples of mechanisms for causing DILI that are not specifically addressed in this study.

Conclusions

Drug-induced liver injury is a major issue for patients and, therefore, also for the process of drug discovery. Within the last decade, several attempts have taken place to predict DILI based on the chemical structure of a compound. In a more mechanistic based approach, one could also think on predicting DILI on basis of i.e. biological fingerprints. As these are usually not available for larger compound sets (at least not in the public domain), we included predicted liver-transporter interaction profiles as additional information. The liver transporter models have been developed in the course of the eTOX project and are available in eTOXsys, the integrated data mining and computational model environment established in the course of the project. Surprisingly, although the role of liver transporter for hepatotoxicity has clearly been demonstrated, this additional piece of information did not significantly improve model performance. Potential reasons for this are outlined above, and most probably the biological fingerprint needs to be substantially broadened by including additional transporter and enzymes to see a significant effect on model performance. The predictivity of computational models heavily depends on the quality of the respective training data set and the domain it covers. In this work, we compiled datasets for DILI available in literature and carefully curated them both with respect to the chemical structures as well as for their class labels (DILI positive, DILI negative). This reduced the amount of compounds available for classification models from 1773 to 966, and in return remarkably increased the quality of the models developed. While in general bigger datasets are preferred for machine learning approaches, the current work once more stresses out the significance of data quality. However, there might be still an amount of mislabeled compounds, as the conflicting class labels for overlapping compounds in the training and test sets show. This further strengthens the tremendous need for industry-driven collaborative efforts such as the eTOX project to share data and to make them publicly available for mining and exploitation. Only large sets of high quality data will allow deriving predictive in silico models covering a broad chemical space.

Supplementary Material

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.tox.2017.06.003.

69 in total

Review 1. Concordance of the toxicity of pharmaceuticals in humans and in animals.

Authors: H Olson; G Betton; D Robinson; K Thomas; A Monro; G Kolaja; P Lilly; J Sanders; G Sipes; W Bracken; M Dorato; K Van Deun; P Smith; B Berger; A Heller
Journal: Regul Toxicol Pharmacol Date: 2000-08 Impact factor: 3.271

Review 2. Drug transport proteins in the liver.

Authors: Klaas Nico Faber; Michael Müller; Peter L M Jansen
Journal: Adv Drug Deliv Rev Date: 2003-01-21 Impact factor: 15.470

Review 3. Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development.

Authors: Steven Whitebread; Jacques Hamon; Dejan Bojanic; Laszlo Urban
Journal: Drug Discov Today Date: 2005-11-01 Impact factor: 7.851

4. A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models.

Authors: Shuxing Zhang; Alexander Golbraikh; Scott Oloff; Harold Kohn; Alexander Tropsha
Journal: J Chem Inf Model Date: 2006 Sep-Oct Impact factor: 4.956

5. Effect of L-carnitine treatment for valproate-induced hepatotoxicity.

Authors: T P Bohan; E Helton; I McDonald; S König; S Gazitt; T Sugimoto; D Scheffner; L Cusmano; S Li; G Koch
Journal: Neurology Date: 2001-05-22 Impact factor: 9.910

6. High concordance of drug-induced human hepatotoxicity with in vitro cytotoxicity measured in a novel cell-based model using high content screening.

Authors: P J O'Brien; W Irwin; D Diaz; E Howard-Cofield; C M Krejsa; M R Slaughter; B Gao; N Kaludercic; A Angeline; P Bernardi; P Brain; C Hougham
Journal: Arch Toxicol Date: 2006-04-06 Impact factor: 5.153

Review 7. Multidrug resistance-associated proteins: Export pumps for conjugates with glutathione, glucuronate or sulfate.

Authors: László Homolya; András Váradi; Balázs Sarkadi
Journal: Biofactors Date: 2003 Impact factor: 6.113

8. Drug-induced liver injury: summary of a single topic clinical research conference.

Authors: Paul B Watkins; Leonard B Seeff
Journal: Hepatology Date: 2006-03 Impact factor: 17.425

9. In silico models for the prediction of dose-dependent human hepatotoxicity.

Authors: Ailan Cheng; Steven L Dixon
Journal: J Comput Aided Mol Des Date: 2003-12 Impact factor: 3.686

10. Early postmarketing drug safety surveillance: data mining points to consider.

Authors: Manfred Hauben
Journal: Ann Pharmacother Date: 2004-08-10 Impact factor: 3.154

12 in total

Review 1. The Promise of AI for DILI Prediction.

Authors: Andreu Vall; Yogesh Sabnis; Jiye Shi; Reiner Class; Sepp Hochreiter; Günter Klambauer
Journal: Front Artif Intell Date: 2021-04-14

2. Comparing Machine Learning Algorithms for Predicting Drug-Induced Liver Injury (DILI).

Authors: Eni Minerali; Daniel H Foil; Kimberley M Zorn; Thomas R Lane; Sean Ekins
Journal: Mol Pharm Date: 2020-06-08 Impact factor: 4.939

3. Machine Learning Models for Predicting Liver Toxicity.

Authors: Jie Liu; Wenjing Guo; Sugunadevi Sakkiah; Zuowei Ji; Gokhan Yavas; Wen Zou; Minjun Chen; Weida Tong; Tucker A Patterson; Huixiao Hong
Journal: Methods Mol Biol Date: 2022

4. In silico approaches in organ toxicity hazard assessment: current status and future needs in predicting liver toxicity.

Authors: Arianna Bassan; Vinicius M Alves; Alexander Amberg; Lennart T Anger; Scott Auerbach; Lisa Beilke; Andreas Bender; Mark T D Cronin; Kevin P Cross; Jui-Hua Hsieh; Nigel Greene; Raymond Kemper; Marlene T Kim; Moiz Mumtaz; Tobias Noeske; Manuela Pavan; Julia Pletz; Daniel P Russo; Yogesh Sabnis; Markus Schaefer; David T Szabo; Jean-Pierre Valentin; Joerg Wichard; Dominic Williams; David Woolley; Craig Zwickl; Glenn J Myatt
Journal: Comput Toxicol Date: 2021-09-09

5. A Computational Toxicology Approach to Screen the Hepatotoxic Ingredients in Traditional Chinese Medicines: Polygonum multiflorum Thunb as a Case Study.

Authors: Shuaibing He; Xuelian Zhang; Shan Lu; Ting Zhu; Guibo Sun; Xiaobo Sun
Journal: Biomolecules Date: 2019-10-07

6. DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance.

Authors: Yasunari Matsuzaka; Yoshihiro Uesawa
Journal: Front Bioeng Biotechnol Date: 2020-01-22

7. Prediction and mechanistic analysis of drug-induced liver injury (DILI) based on chemical structure.

Authors: Anika Liu; Moritz Walter; Peter Wright; Aleksandra Bartosik; Daniela Dolciami; Abdurrahman Elbasir; Hongbin Yang; Andreas Bender
Journal: Biol Direct Date: 2021-01-18 Impact factor: 4.540

8. Combining In Vivo Data with In Silico Predictions for Modeling Hepatic Steatosis by Using Stratified Bagging and Conformal Prediction.

Authors: Sankalp Jain; Ulf Norinder; Sylvia E Escher; Barbara Zdrazil
Journal: Chem Res Toxicol Date: 2020-12-21 Impact factor: 3.739

Review 9. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches.

Authors: Hyunho Kim; Eunyoung Kim; Ingoo Lee; Bongsung Bae; Minsu Park; Hojung Nam
Journal: Biotechnol Bioprocess Eng Date: 2021-01-07 Impact factor: 3.386

10. Hepatotoxicity Modeling Using Counter-Propagation Artificial Neural Networks: Handling an Imbalanced Classification Problem.

Authors: Benjamin Bajželj; Viktor Drgan
Journal: Molecules Date: 2020-01-23 Impact factor: 4.411