| Literature DB >> 31020328 |
Jeffrey J Sutherland1, James L Stevens2,3, Kamin Johnson4, Navin Elango4, Yue W Webster2, Bradley J Mills1, Daniel H Robertson1.
Abstract
Applying toxicogenomics to improving the safety profile of drug candidates and crop protection molecules is most useful when it identifies relevant biological and mechanistic information that highlights risks and informs risk mitigation strategies. Pathway-based approaches, such as gene set enrichment analysis, integrate toxicogenomic data with known biological process and pathways. Network methods help define unknown biological processes and offer data reduction advantages. Integrating the 2 approaches would improve interpretation of toxicogenomic information. Barriers to the routine application of these methods in genome-wide transcriptomic studies include a need for "hands-on" computer programming experience, the selection of 1 or more analysis methods (eg pathway analysis methods), the sensitivity of results to algorithm parameters, and challenges in linking differential gene expression to variation in safety outcomes. To facilitate adoption and reproducibility of gene expression analysis in safety studies, we have developed Collaborative Toxicogeomics, an open-access integrated web portal using the Django web framework. The software, developed with the Python programming language, is modular, extensible and implements "best-practice" methods in computational biology. New study results are compared with over 4000 rodent liver experiments from Drug Matrix and open TG-GATEs. A unique feature of the software is the ability to integrate clinical chemistry and histopathology-derived outcomes with results from gene expression studies, leading to relevant mechanistic conclusions. We describe its application by analyzing the effects of several toxicants on liver gene expression and exemplify application to predicting toxicity study outcomes upon chronic treatment from expression changes in acute-duration studies.Entities:
Keywords: gene expression analysis; mechanism inference; systems biology; toxicity prediction; toxicogenomics
Year: 2019 PMID: 31020328 PMCID: PMC6657575 DOI: 10.1093/toxsci/kfz101
Source DB: PubMed Journal: Toxicol Sci ISSN: 1096-0929 Impact factor: 4.849
Figure 1.Illustrating key functionality in the CTox web application. A, Background and user guide links when accessing the application. B, Dialog for searching and adding experiments of interest to an analysis “cart.” C, Viewing gene-level results for experiments in the analysis cart. D, Visualization of co-expression results on the TXG-MAP (see Results section).
Analysis Methods Available Within the CTox Web Application
| Method | Summary | Strengths/Weaknesses |
|---|---|---|
| Gene-level analysis | A list of all genes probed by the measurement technology, with fold change and associated | Ease of interpretation, but high risk of false positives due to large number of hypotheses being tested. |
| GSA | A list of gene sets (GO terms, pathways, and others) for which the constituent genes are disproportionally induced or repressed compared with all measured genes; returns a score and | Simplifies by summarizing effects aggregated across related genes. Ease of interpretation depends on the perturbed gene set; eg “cholesterol biosynthesis” is clear in the context of liver, but “generation of neurons” is not. In theory, lower risk of false positives, however the MSigDB collection now approaches 18 000 gene sets. |
| TXG-MAP module analysis | A list of scores, 1 for each of 415 modules in the TXG-MAP. Modules consist of co-expressed genes. | Simplifies by summarizing across co-expressed genes. Unlike other methods, provides context on magnitude of effect compared with other liver perturbations. Captures co-regulated biology not described by canonical pathways, but interpretation requires study of constituent genes. Results depend on quality/comprehensiveness of training datasets. Low risk of false positives. |
| Similar experiments analysis | A list of most similar experiments, compared with user’s experiment. Global transcriptional similarity, assessed by comparing approximately 400 GSA scores or TXG-MAP scores between experiments. | The fastest way to help understand mechanism, in those cases where a compound of interest is similar to compounds/drugs with well-understood mechanistic effects. Intermediate risk of false positives (ca. 4000 comparisons performed). Requires global similarity (ie expression profile similarity across all pathways/modules); often no high similarity reference compounds and therefore no insights from approach. |
| Clinical chemistry/histology findings | For DM and TGs, standardized clinical chemistry, and histology findings | NA—not expression analysis. When associated with gene expression, results yield meaningful association with standard toxicology interpretations. |
High risk of false positives if not applying a multiple hypothesis testing correction to the p values; when applying such a correction, it becomes a high risk of false negatives—ie a gene is genuinely differentially expressed with a small non-adjusted p value, but after adjusting, the p value (or q value) is no longer small enough to be recognized as significant. The application described herein returns both unadjusted and adjusted p values.
Figure 2.Network map of nonredundant inducible pathways in rat liver. The full set of 1839 inducible/repressible pathways was clustered using gene set analysis scores for 3528 TG liver experiments; (A) showing 1 cluster (cluster 1578), which includes 30 pathways and 4 GO-CC terms (all relating to the proteasome) with highly correlated scores (see Materials and Methods section). One pathway (REACTOME Genes involved in Destabilization of mRNA by AUF1) best captured the variation in scores for all terms and was selected as a representative for the cluster. Other selections, based on maximal perturbation across experiments of interest, or most significant association with a toxicity phenotype are possible. The grouping is based solely on correlation of pathway scores, not gene membership or biological themes among the gene sets. B, The set of 1839 pathways reduced to 382 clusters, with each cluster represented as a node on the map; the cluster in (A) is represented as 1 node on the map. Proximity in the map, as measured by traversal of branches, corresponds to correlation of scores between clusters. Clusters range in size from 1 through 92 gene sets (node size), and when counting all unique genes among their members, include from 5 to 2039 unique genes (shading).
Top Three Nonredundant Pathways for Association With Toxicity Phenotype
|
|
|
|
|---|---|---|
| Adverse at 29 days | 1d | GO: cellular amino acid metabolic process (−0.7), GO: drug metabolic process (−0.6), GO: unsaturated fatty acid metabolic process (0.8) |
| Bile duct hyperplasia | 1d | GO: positive regulation of leukocyte apoptotic process (1.2), GO: cofactor metabolic process (−0.6), KEGG: Cell adhesion molecules (0.6) |
| C | REACTOME: Genes involved in Apoptotic execution phase (1.8), GO: positive regulation of extrinsic apoptotic signaling pathway in absence of ligand (1.7), GO: regulation of triglyceride metabolic process (−1.7) | |
| Cholesterol decrease | 1d | GO: regulation of membrane potential (−0.9), GO: germ cell nucleus (1), GO: regulation of feeding behavior (−1) |
| C | REACTOME: Genes involved in Cell surface interactions at the vascular wall (−0.8), GO: extracellular matrix (−0.5), REACTOME: Genes involved in Association of TriC/CCT with target proteins during biosynthesis (1.2) | |
| Cholesterol increase | C | REACTOME: Genes involved in Ethanol oxidation (0.7), GO: ethanol catabolic process (0.6), GO: canalicular bile acid transport (0.5) |
| Hematopoiesis | 1d | GO: negative regulation of programed cell death (1.3), REACTOME: Genes involved in Sulfur amino acid metabolism (−1.4), GO: dicarboxylic acid metabolic process (−1.1) |
| C | GO: apoptotic cell clearance (2.3), GO: complement activation, classical pathway (2.2), GO: blood coagulation (2.2) | |
| Hypertrophy | 1d | REACTOME: Genes involved in Orc1 removal from chromatin (0.6), REACTOME: Genes involved in SCF(Skp2)-mediated degradation of p27/p21 (0.7), GO: ribosomal large subunit export from nucleus (1.5) |
| C | GO: protein homotetramerization (1.2), GO: sterol esterification (−1.1), REACTOME: Genes involved in Recycling of bile acids and salts (1.1) | |
| Increased glycogen | C | PID: TNF receptor signaling pathway PMID: 18832364 (−1.6), GO: cellular response to lipoprotein particle stimulus (−0.9), GO: intrinsic apoptotic signaling pathway in response to DNA damage (−1.5) |
| Increased mitosis | 1d | KEGG: Pyruvate metabolism (1.4), KEGG: Propanoate metabolism (1), GO: single-organism catabolic process (0.6) |
| Increased mitosis | C | GO: nuclear ubiquitin ligase complex (1.6), PID: Aurora A signaling PMID: 18832364 (1.1), GO: homologous chromosome segregation (1) |
| Necrosis | C | GO: macrophage chemotaxis (0.8), GO: myeloid leukocyte migration (0.5), GO: monocarboxylic acid catabolic process (−0.8) |
| Single cell necrosis | C | REACTOME: Genes involved in Activation of Genes by ATF4 (1), GO: endoplasmic reticulum unfolded protein response (1.1), PID: ATF-2 transcription factor network PMID: 18832364 (0.6) |
| Trigs decrease | C | BIOCARTA: Apoptotic DNA fragmentation and tissue homeostasis (−1.3), GO: pronucleus (−1), GO: protein depolymerization (−0.9) |
| Trigs increase | C | KEGG: Steroid biosynthesis (−0.3), GO: sterol esterification (0.7), GO: response to drug (−0.4) |
| Vacuolation | C | KEGG: Steroid biosynthesis (0.7), GO: isoprenoid biosynthetic process (0.7), GO: extracellular matrix assembly (−0.8) |
Representative toxicity phenotype as defined in methods; “adverse at 29 days” is an aggregate endpoint and included any treatments that produced 1 or more adverse morphologic changes at 29 days of dosing; only relationships at 1 day (1d) were evaluated.
Whether expression analysis is performed from samples collected after 1 day (1d) or concurrent with the observed toxicity phenotype (C).
Showing the top 3-ranked pathways for the toxicity phenotype and time, ranked using p-adj per (Sutherland ), selecting only the most highly ranked pathway within each cluster. The value in parentheses is the pathway’s coefficient in the logistic regression model, interpreted as the natural log of odds-ratio for observing toxicity given a 1 unit increase in the pathway score. The sign indicates whether induction (positive) or repression (negative) associates with increased odds of toxicity. A coefficient value of 0.69 corresponds to 2× odds of toxicity, 1.1 corresponds to 3×, etc. The full results are provided in Supplementary Dataset 2.
Figure 3.Pathways perturbed by omeprazole treatment in rat liver. Expression results were analyzed for omeprazole 24 h following the administration of single doses of 300 and 1000 mg/kg. A, Only pathways with gene set analysis score p-adj < .001 at both doses were retained. Within each of 382 pathway clusters, the pathway with the highest average absolute score across the 2 doses was selected. B, Only the subset of 8 pathways ranked in the top 50 out of 1839 for association with hypertrophy was considered.
Figure 4.Comparison of TG liver experiment scores via nonredundant pathways and TXG-MAP modules. A, When comparing scores for 3528 TG experiments, the most similar pathway for module 42 m was KEGG glutathione metabolism, and vice versa, with Pearson R = 0.82. B, The analysis in (A) was repeated for each module, identifying the most similar pathway, and assigning the calculated Pearson R values to ranges as shown; stacked bar graphs show the cumulative distribution of similarities across the 415 modules. Likewise, each pathway was compared with modules, the most similar identified and assigned to similarity ranges in the same manner.
Figure 5.Module and pathway association with toxicity phenotypes. For selected toxicity phenotypes (trellis panel columns), the significance of the relationship (q-adj) between module/pathway scores and occurrence of toxicity is shown for predictive applications (ie expression analysis at 1 day) and concurrent with injury (trellis panel rows). Small q-adj values indicate greater significance. The horizontal line at q-adj = .1 denotes a common cutoff for statistical significance. Points are jittered horizontally for clarity.
Figure 6.Similar experiments analysis using modules and pathways. A, Comparing pairwise experiment similarities for nonredundant pathways (gene set analysis) scores and co-expression modules scores; Pearson R = 0.59 on ca 6.2 M pairs of TG experiments. Darker contours show increasing density of points, and contour values are an arbitrary linear scale. Labeled experiment pairs indicate compound name, treatment duration (h-hours, d-days) and dose. B, Composition of high similarity experiment pairs; similarities were averaged for pathways and modules, and assigned to ranges of 0.7–0.8, 0.8–0.9 and > 0.9. The proportion of experiment pairs involving the same compound (at different doses and/or timepoints), or both compounds being NSAIDs or PPAR modulators accounted for the majority of very high similarity pairs.
Most Perturbed Modules and Pathways by the Uricosuric Agents Benzbromarone and Benziodarone, and Relationship to Observed Toxicity Phenotypes
| Gene Set |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| GO: fatty acid catabolic process | 1 | 1 | 2.64E-10 | 20 | 0.0000131 | 117 |
| DM: liver: 26 | 2 | 2 | 0.000041 | 36 | 0.000043 | 195 |
| DM: liver: 17 | 2.5 | 1.5 | 0.0000064 | 21 | 9.2E-08 | 130 |
| GO: negative regulation of oxidative stress-induced intrinsic apoptotic signaling pathway | 3 | 9 | 5.06E-08 | 26 | 2.94E-11 | 25 |
| DM: liver: 285 | 5.5 | 16.5 | 0.025 | 177 | 2.7E-12 | 73 |
| GO: coenzyme metabolic process | 5.5 | 8 | 0.0000429 | 43 | 8.73E-15 | 15 |
| REACTOME: Genes involved in destabilization of mRNA by AUF1 (hnRNP D0) | 5.5 | 7.5 | 0.00026007 | 51 | 1.15E-07 | 72 |
| REACTOME: Genes involved in metabolism of lipids and lipoproteins | 6.5 | 7 | 6.16E-13 | 7 | 0.00000173 | 92 |
Each pathway or module was ranked from most to least perturbed (absolute value of pathway/module score) within each of 6 uricosuric agent and 26 PPAR modulator experiments, and the rank averaged across the 6/26 experiments.
The adjusted q value indicating the pathway or module’s general association with the phenotype across all TGs experiments.
The corresponding rank for the phenotype. These are “concurrent” associations, because expression results were taken from samples where the phenotype is present. Abbreviation: Trigs, triglycerides. Full results are provided in Supplementary Dataset 5.