| Literature DB >> 30933541 |
Daniel P Russo1, Judy Strickland2, Agnes L Karmaus2, Wenyi Wang1, Sunil Shende1,3, Thomas Hartung4,5, Lauren M Aleksunes6, Hao Zhu1,7.
Abstract
BACKGROUND: Low-cost, high-throughput in vitro bioassays have potential as alternatives to animal models for toxicity testing. However, incorporating in vitro bioassays into chemical toxicity evaluations such as read-across requires significant data curation and analysis based on knowledge of relevant toxicity mechanisms, lowering the enthusiasm of using the massive amount of unstructured public data.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30933541 PMCID: PMC6785238 DOI: 10.1289/EHP3614
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Figure 1.Overview of the data-driven read-across approach developed in this study. (A) Profiling compounds via PubChem portal, (B) clustering PubChem assays based on chemical fragment in vitro relationships, (C) identifying bioassays capable of predicting acute oral toxicity, and (D) predicting new toxicants by read-across and mechanism illustrations.
Figure 2.Activity distribution for compounds in this study. (A) Histogram for the 7,385 log-transformed values from compounds in the original dataset (hashed bars in background) and the final 3,543 compounds with biological data ultimately used for modeling (solid bars in foreground). The parameter, , is equal to the midpoint of the logistic curve displayed in (B) and was used as the cutoff for determining compound toxicity. (B) The parameter s controls the shape of the curve. Lines represent the logistic function at various values for s (0.1 through 0.9 at steps of 0.1). The final curve, with , is bold. The thin areas on the line indicate log-transformed values or and their corresponding values in f space. The thick line indicate log-transformed values and .
Figure 3.Similarity map of PubChem in vitro bioassays used in this study. Nodes represent PubChem assays; edges represent Jaccard dissimilarity computed based on chemical fragment bioprofiles. In total, 45 distinct clusters were identified by the Louvain modularity algorithm. Nodes belonging to the same cluster are presumed to have a shared biological relevance and were used to perform read-across.
Source depositors of the PubChem 640 assays used in this study.
| Source institution | Number of assays | Cluster membership |
|---|---|---|
| NCGC | 200 | 2, 3, 4, 8, 9, 10, 12, 13, 14, 16, 17, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 32, 35, 36, 37, 39, 40, 41, 42, 44, 45 |
| Tox21 | 114 | 3, 9, 10, 11, 12, 15, 17, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 38, 44 |
| DTP/NCI | 96 | 1, 2, 5, 6, 7, 8, 10, 18, 19, 20, 21, 27 |
| Scripps Research Institute Molecular Screening Center | 67 | 2, 3, 4, 5, 8, 10, 14, 16, 18, 19, 20, 21, 27, 35, 37, 39, 40 |
| Sanford-Burnham Center for Chemical Genomics | 41 | 2, 3, 4, 5, 8, 10, 14, 15, 18, 20, 21, 27, 35, 37, 43, 45 |
| Broad Institute | 31 | 2, 3, 8, 10, 12, 14, 16, 18, 20, 21, 27, 34, 35 |
| Cheminformatics and Chemogenomics Research Group | 16 | 3, 15, 17, 20, 24, 27, 29, 32, 36 |
| EPA DSSTox | 11 | 3, 6, 7, 37, 38 |
| Johns Hopkins Ion Channel Center | 11 | 3, 10, 13, 17, 20 |
| Southern Research Institute | 10 | 3, 14, 15, 17, 20, 21, 24 |
| Southern Research Specialized Biocontainment Screening Center | 10 | 3, 12, 27, 37 |
| Emory University Molecular Libraries Screening Center | 9 | 3, 14, 21, 28 |
| University of New Mexico | 8 | 14, 18, 20, 21, 41 |
| ICCB-Longwood/NSRB Screening Facility, Harvard Medical School | 4 | 3, 12, 14 |
| Vanderbilt High Throughput Screening Facility | 3 | 3, 17, 20 |
| Columbia University Molecular Screening Center | 2 | 14 |
| University of Pittsburgh Molecular Library Screening Center | 2 | 3,17 |
| ChEMBL | 2 | 14, 27 |
| Milwaukee Institute for Drug Discovery | 1 | 4 |
| Institute for Research in Immunology and Cancer | 1 | 10 |
| Psychoactive Drug Screening Program | 1 | 20 |
Note: ChEMBL, European Molecular Biology Laboratory chemistry database; DTP/NCI, Developmental Therapeutics Program/National Cancer Institute; EPA DSSTox, Environmental Protection Agency Distributed Structure-Searchable Toxicity Database; GPCR, G protein–coupled receptors; ICCB, Institute of Chemistry and Cell Biology; NCGC, National Center for Advancing Translational Sciences Chemical Genomics Center; NSRB, National Screening Laboratory for the Regional Centers of Excellence for Biodefence and Emerging Infectious Diseases; Tox21, Toxicity Testing in the 21st Century.
Cluster membership displays an exhaustive list of all the clusters to which a particular source has at least one bioassay belong to, as identified by the Louvain modularity algorithm.
Figure 4.Individual cluster model–specific positive predictive value (ppv) results from fivefold cross-validation. Numbers along the x-axis correspond to clusters identified within the network in Figure 2. The first 25 clusters have (shaded area), although five have less than five bioassays (marked by “x”) and were omitted from further analyses.
Figure 5.Bioprofile-based prediction performance on the external test set of 639 chemicals. The 639 external test set compounds were used to evaluate the predictivity of 19 clusters and the ensemble model at varying relative cluster confidence (rcc) levels. (A) Positive predictive value (ppv) of the ensemble model at increasing rcc thresholds (same x-axis as grid plot in panel B). (B) Grid plot showing confusion matrices of the ensemble model and the individual clusters at different rcc thresholds. Each confusion matrix contains the number of false positives (upper left corner), true positives (upper right corner), false negatives (lower left corner), and true negatives (lower right corner) for a model at a particular rcc threshold.
Figure 6.Cluster 1 predictions. (A) Principal component analysis (PCA) of Cluster 1 model predictions (true positives as highlighted circles and false positives as diamonds). (B) Two examples of true positive compounds with their common representative substructure in bold. (C) Two examples of false positive compounds with their common representative substructure in bold.
PubChem Assays involving protein or viral targets within Cluster 8 model.
| PubChem AID | Name | Target |
|---|---|---|
| 488899 | MITF measured in cell-based system using plate reader | Microphthalmia-associated transcription factor ( |
| 504444 | Nrf2 qHTS screen for inhibitors | Nuclear factor erythroid 2–related factor 2 isoform 2 ( |
| 540276 | qHTS for inhibitors of binding or entry into cells for Marburg virus | Gene 4 small orf (Marburg virus) |
| 588413 | uHTS identification of Gli-Sufu antagonists in a luminescence reporter assay | Glioma-associated oncogene 1 ( |
| 624169 | Luminescence-based cell-based primary high- throughput screening assay to identify agonists of the mouse 5-hydroxytryptamine (serotonin) receptor 2A (HTR2A) | 5-Hydroxytryptamine receptor 2A ( |
| 624354 | uHTS identification of Caspase-8 TRAIL sensitizers in a luminescence assay | Tumor necrosis factor receptor superfamily member 10B isoform 1 precursor ( |
| 651820 | qHTS assay for inhibitors of hepatitis C virus (HCV) | Hepatitis C virus |
Note: MITF, microphthalmia-associated transcription factor; Nrf2, nuclear factor erythroid 2–related factor 2; qHTS, quantitative high-throughput screening; TRAIL, tumor necrosis factor-related apoptosis-inducing ligand; uHTS, ultra-high-throughput screening.
Figure 7.Cluster 8 predictions. (A) Principal component analysis (PCA) of Cluster 8 model predictions (true positives as highlighted circles and false positives as diamonds). (B) Three examples of true positives and the potential toxicophores associated with this cluster in bold are shown.