| Literature DB >> 34153183 |
Marina Garcia de Lomana1,2, Andrea Morger3, Ulf Norinder4, Roland Buesen1, Robert Landsiedel1, Andrea Volkamer3, Johannes Kirchmair2, Miriam Mathea1.
Abstract
Computational methods such as machine learning approaches have a strong track record of success in predicting the outcomes of in vitro assays. In contrast, their ability to predict in vivo endpoints is more limited due to the high number of parameters and processes that may influence the outcome. Recent studies have shown that the combination of chemical and biological data can yield better models for in vivo endpoints. The ChemBioSim approach presented in this work aims to enhance the performance of conformal prediction models for in vivo endpoints by combining chemical information with (predicted) bioactivity assay outcomes. Three in vivo toxicological endpoints, capturing genotoxic (MNT), hepatic (DILI), and cardiological (DICC) issues, were selected for this study due to their high relevance for the registration and authorization of new compounds. Since the sparsity of available biological assay data is challenging for predictive modeling, predicted bioactivity descriptors were introduced instead. Thus, a machine learning model for each of the 373 collected biological assays was trained and applied on the compounds of the in vivo toxicity data sets. Besides the chemical descriptors (molecular fingerprints and physicochemical properties), these predicted bioactivities served as descriptors for the models of the three in vivo endpoints. For this study, a workflow based on a conformal prediction framework (a method for confidence estimation) built on random forest models was developed. Furthermore, the most relevant chemical and bioactivity descriptors for each in vivo endpoint were preselected with lasso models. The incorporation of bioactivity descriptors increased the mean F1 scores of the MNT model from 0.61 to 0.70 and for the DICC model from 0.72 to 0.82 while the mean efficiencies increased by roughly 0.10 for both endpoints. In contrast, for the DILI endpoint, no significant improvement in model performance was observed. Besides pure performance improvements, an analysis of the most important bioactivity features allowed detection of novel and less intuitive relationships between the predicted biological assay outcomes used as descriptors and the in vivo endpoints. This study presents how the prediction of in vivo toxicity endpoints can be improved by the incorporation of biological information-which is not necessarily captured by chemical descriptors-in an automated workflow without the need for adding experimental workload for the generation of bioactivity descriptors as predicted outcomes of bioactivity assays were utilized. All bioactivity CP models for deriving the predicted bioactivities, as well as the in vivo toxicity CP models, can be freely downloaded from https://doi.org/10.5281/zenodo.4761225.Entities:
Year: 2021 PMID: 34153183 PMCID: PMC8317154 DOI: 10.1021/acs.jcim.1c00451
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Overview of Collected Assay Data
| database/endpoint | description | source |
|---|---|---|
| ToxCast database | • 222 high-throughput screening assays, including endpoints related to cell cycle and morphology control, steroid hormone homeostasis, DNA-binding proteins, and other protein families (e.g., kinases, cytochromes, and transporters) | ToxCast database version 3.3[ |
| eMolTox database | • 136 in vitro assays, including endpoints related to mutagenicity, cytotoxicity, hormone homeostasis, neurotransmitters, and several protein families (e.g., nuclear receptors, cytochromes, and cell surface receptors) | Ji et al.[ |
| genotoxicity | • AMES mutagenicity assay | AMES assay: eChemPortal,[ |
| • chromosome aberration (CA) assay | ||
| • mammalian mutagenicity (MM) assay | CA and MM assays: eChemPortal, Benigni et al. | |
| bioavailability | • human oral bioavailability assay | Falcón-Cano
et al.[ |
| permeability | • Caco-2 assay | Wang et al.[ |
| thyroid hormone homeostasis | • deiodinases 1, 2, and 3 inhibition assays | Garcia de Lomana et al.[ |
| • thyroid peroxidase inhibition assay | ||
| • sodium iodide symporter inhibition assay | ||
| • thyroid hormone receptor antagonism assay | ||
| • thyrotropin-releasing hormone receptor antagonism assay | ||
| • thyroid stimulating hormone receptor agonism and antagonism assays | ||
| P-glycoprotein inhibition | • P-glycoprotein (ABCB1) inhibition assay | Broccatelli
et al.[ |
Overview of the Data Sets for the in Vivo Endpoints
| number
of | |||
|---|---|---|---|
| endpoint | active compounds | inactive compounds | ratio |
| MNT | 316 | 1475 | 1:5 |
| DILI | 445 | 247 | 2:1 |
| DICC | 988 | 2268 | 1:2 |
Figure 1Workflow for the derivation of the bioactivity descriptors for the in vivo toxicity CP models. For each biological assay, a conformal prediction model is built and used to predict the p-values of the compounds in the three in vivo endpoint data sets. These predicted p-values are used as bioactivity descriptors, in combination with chemical descriptors, for training the models of the in vivo endpoints.
Figure 2Workflow of the aggregated Mondrian CP set up for the development of the models for the biological assays and the in vivo endpoints. The aggregated CP framework included 20 random splits in calibration and proper training data sets, on which individual RF models were trained, and the resulting p-values per test compound were afterward averaged. The feature selection step was implemented with a lasso model and only included in the development of the in vivo toxicity CP models (in vivo toxicity CP models without feature selection were also trained for comparison).
Percentage of Compounds in the Reference Data Sets Covered by Compounds in the Three In Vivo Endpoint Data Sets (MNT, DILI, DICC) at Given Similarity Thresholds
| endpoint | ||||
|---|---|---|---|---|
| parameter | Tanimoto coefficient threshold | MNT | DILI | DICC |
| % coverage pesticides | 1.0 | 16 | 2 | 6 |
| ≥0.8 | 17 | 2 | 7 | |
| ≥0.6 | 29 | 3 | 11 | |
| ≥0.4 | 62 | 10 | 36 | |
| ≥0.2 | 99 | 85 | 97 | |
| % coverage cosmetics | 1.0 | 10 | 1 | 7 |
| ≥0.8 | 14 | 1 | 9 | |
| ≥0.6 | 29 | 3 | 17 | |
| ≥0.4 | 68 | 17 | 58 | |
| ≥0.2 | 99 | 89 | 99 | |
| % coverage drugs | 1.0 | 8 | 7 | 34 |
| ≥0.8 | 9 | 8 | 37 | |
| ≥0.6 | 16 | 15 | 51 | |
| ≥0.4 | 40 | 34 | 73 | |
| ≥0.2 | 99 | 96 | 100 | |
Tanimoto coefficients calculated from binary Morgan fingerprints (1024 bits and radius 2).
Figure 3Principal component analysis based on a selection of interpretable molecular descriptors generated with RDKit on the merged in vivo toxicity data sets. Inactive compounds are colored in red and active compounds in green. The variance explained by the first two principal components is indicated in the axes.
Figure 4Distribution of pairwise Tanimoto coefficients based on atom-pair fingerprints for three types of compound pairs: (a) active-to-active (blue), (b) inactive-to-inactive (orange), and (c) active-to-inactive (green).
Figure 5Histogram of the performance distribution of the CP models for the biological assays. All models were valid but their efficiencies and F1 scores showed a high degree of variability.
Figure 6Percentage of the 373 bioactivity CP models showing mean efficiencies and mean F1 scores in the four given ranges.
Average Performance of the CP Models Generated from a Selected Set of Featuresa
| endpoint | descriptor | validity | STD validity | efficiency | STD efficiency | F1 score | STD F1 score | MCC | STD MCC |
|---|---|---|---|---|---|---|---|---|---|
| MNT | CHEM | 0.77 | 0.02 | 0.76 | 0.05 | 0.61 | 0.02 | 0.28 | 0.05 |
| BIO | 0.03 | 0.81 | 0.05 | 0.03 | 0.06 | ||||
| CHEMBIO | 0.81 | 0.03 | 0.03 | 0.03 | 0.44 | 0.07 | |||
| DILI | CHEM | 0.78 | 0.05 | 0.04 | 0.74 | 0.05 | 0.49 | 0.09 | |
| BIO | 0.04 | 0.83 | 0.07 | 0.76 | 0.04 | 0.53 | 0.07 | ||
| CHEMBIO | 0.03 | 0.88 | 0.04 | 0.03 | 0.06 | ||||
| DICC | CHEM | 0.79 | 0.02 | 0.84 | 0.02 | 0.72 | 0.03 | 0.46 | 0.05 |
| BIO | 0.79 | 0.02 | 0.02 | 0.81 | 0.01 | 0.63 | 0.02 | ||
| CHEMBIO | 0.79 | 0.02 | 0.94 | 0.01 | 0.01 | 0.03 |
Mean and standard deviation (STD) calculated over a 5-fold CV. The highest mean per metric and endpoint is highlighted (bold).
Figure 7Distribution of the validity, efficiency, and F1 score values obtained within the 5-fold CV framework for the (a) MNT, (b) DILI, and (c) DICC CP models built on the different descriptor sets after feature selection. The CHEM descriptor set includes the molecular fingerprint and physicochemical descriptors; the BIO descriptor set includes the predicted p-values for a set of biological endpoints (bioactivity descriptor); the CHEMBIO descriptor set includes the previous two descriptor sets. Significant differences in the distribution (p-value <0.05) are denoted by a star.
Summary of Model Performances of the ChemBioSim Models and Existing Methods
| endpoint | model | mean sensitivity | mean specificity | evaluation | modeling approach | comments |
|---|---|---|---|---|---|---|
| MNT | Yoo et al. | 0.54–0.74 | 0.77–0.93 | 5% leave-many-out | Leadscope Enterprise and CASE Ultra software | variations related to different modeling approaches |
| our method | 0.78 | 0.76 | 5-fold CV | CP built on RF models | CHEMBIO model with feature selection | |
| DILI | Ancuceanu et al. | 0.83 | 0.66 | nested CV | meta-model with a naïve Bayes model trained on output probabilities of 50 ML models | |
| our method | 0.78 | 0.78 | 5-fold CV | CP built on RF models | CHEMBIO model with feature selection | |
| DICC | Cai et al. | 0.69–0.75 | 0.72–0.81 | 5-fold CV | combined classifier using neural networks based on four single classifiers | results refer to five cardiological complications endpoints evaluated independently |
| our method | 0.83 | 0.86 | 5-fold CV | CP built on RF models | CHEMBIO model with feature selection |
Figure 8Mean feature importance reported by the RF model for the bioactivity descriptors in relationship with the percentage of overlapping compounds (of the in vivo data set), the efficiency and F1 score of the models for each biological assay. For each of the 373 biological assays, the highest mean feature importance of the two p-values used as descriptors (for the active and inactive classes of each assay) was taken. The feature importance values were normalized with a min-max normalization (from 0.01 to 1; see Materials and Methods section) for easier comparison.