| Literature DB >> 18364709 |
Georges Natsoulis1, Cecelia I Pearson, Jeremy Gollub, Barrett P Eynon, Joe Ferng, Ramesh Nair, Radha Idury, May D Lee, Mark R Fielden, Richard J Brennan, Alan H Roter, Kurt Jarnagin.
Abstract
We have used a supervised classification approach to systematically mine a large microarray database derived from livers of compound-treated rats. Thirty-four distinct signatures (classifiers) for pharmacological and toxicological end points can be identified. Just 200 genes are sufficient to classify these end points. Signatures were enriched in xenobiotic and immune response genes and contain un-annotated genes, indicating that not all key genes in the liver xenobiotic responses have been characterized. Many signatures with equal classification capabilities but with no gene in common can be derived for the same phenotypic end point. The analysis of the union of all genes present in these signatures can reveal the underlying biology of that end point as illustrated here using liver fibrosis signatures. Our approach using the whole genome and a diverse set of compounds allows a comprehensive view of most pharmacological and toxicological questions and is applicable to other situations such as disease and development.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18364709 PMCID: PMC2290941 DOI: 10.1038/msb.2008.9
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Summary description of the database
| Characteristics of the liver xenobiotic and pharmacologic response data set | |
|---|---|
| Arrays (4941 treated+347 untreated controls) | 5288 |
| Treatments (biological triplicate) | 1695 |
| Compounds | 344 |
| Structure activity classes (SACs) | 171 |
| Pharmacologic activity classes (ACs) | 77 |
| Clinical chemistry | 46 |
| Liver histopathology annotations | 57 |
aDefined by chemists as being distinct structural classes.
bSecond level of a two-step hierarchy. Several structurally distinct but related SACs are grouped into one AC if they share a target.
cIncludes blood chemistry and hematology assays.
Summary description of the results of the systematic mining
| Signature type: | Candidates | Passing validity criteria |
|---|---|---|
| Body and organ weights | 25 | 5 |
| Clinical chemistry | 477 | 61 |
| Histopathology | 317 | 65 |
| Therapeutic indication | 52 | |
| Pharmacology | 1241 | 49 |
| Total | 2112 | 180 |
aIncludes blood chemistry and hematology assays.
Figure 1(A) Comparing the 180 valid signatures with five commonly used tests. The sensitivity and specificity of the 180 valid signatures are compared for Pap smears, PSA, mammograph, chest X-ray and Ames test (http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hstat1.table.7254; Kim and Margolin, 1999; Mistry and Cable, 2003; Loy ). Iso-log odds curves for LOR equal to 2, 3 and 4 are shown for reference. (B) Modeling the average performance of signatures derived from databases of increasing size. The complete data set was split 20 times into a training data set and a forward validation data set. The training set was further split into half-sized and quarter-sized training sets. All splits were carried out at the compound level (i.e. all samples treated with the same compound are either included or not in a given set), thus modeling a growing toxicology database. The five signatures with the largest positive class were chosen for this study out of the set of 180 valid signatures. Each signature was re-derived and internally cross-validated as previously described from the 20 different quarter-size, half-size and full-size training sets. Each signature was also evaluated on the forward validation data set. The graph represents the average cross-validation and forward validation results for the five signatures.
Figure 2Determination of a unique set of liver signatures. (A) A scalar product by treatment heat map, expressing the scores of all expression profiles against all signatures. (B) A blowup of regions a and b of (A) is shown. Negative values were set to zero and the resulting positive SP table was submitted to unweighted average-linkage clustering using an uncentered Pearson's correlation metric. Applying a correlation of 0.6 results in 34 clusters of signatures. The best representative signature per cluster is the signature with the highest positive predictive value (PPV). Color scale is continuous between black (SP=0), yellow (SP=0.5) and red (SP⩾1). Samples marked with a star are all treatments with methylenedianiline, a DNA alkylator whose profile match both DNA alkylator and fibrosis/bile duct hyperplasia groups of signatures.
Characteristics of 34 unique signatures
| Unique signature ID | Use | Type | Subtype | Class 1 size (no. of treatments) | Class −1 size (no. of treatments) | Class 1 compound count | Class −1 compound count | Class 1 AC count | Class −1 AC count | Number of signature in cluster | PPV | Sen- sitivity | Spec- ificity | Log odds ratio (LOR) | Signature length (no. of genes) | Number of signature cycles | Number of probes in NGS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SV0567082R5RU | T | CP | Absolute monocyte increase | 34 | 680 | 24 | 276 | 16 | 64 | 2 | 0.54 | 43.2 | 98.1 | 3.7 | 102 | 1 | 113 |
| SV0567098R5RU | T | CP | Creatinine increase | 38 | 757 | 18 | 308 | 16 | 66 | 3 | 0.62 | 47.8 | 98.5 | 4.1 | 141 | 1 | 127 |
| SV0567149R5RU | T | CP | Albumin increase | 33 | 604 | 17 | 273 | 11 | 65 | 2 | 0.71 | 52.5 | 98.8 | 4.5 | 105 | 2 | 186 |
| SV0562011R5RU | T | CP | Mean corpuscular hemoglobin concentration decrease (diagnostic, 3–7D time points) | 32 | 753 | 16 | 311 | 10 | 69 | 14 | 0.82 | 72.7 | 99.3 | 5.9 | 74 | 3 | 252 |
| SV0571010R5RU | T | CP | Mean corpuscular hemoglobin concentration. decrease (predictive, 0.25–1D time points) | 18 | 559 | 15 | 267 | 10 | 66 | 3 | 0.65 | 42.5 | 99.2 | 4.5 | 75 | 2 | 140 |
| SV0567088R5RU | T | CP | Glucose increase | 27 | 717 | 14 | 310 | 10 | 68 | 5 | 0.77 | 46.4 | 99.5 | 5.1 | 104 | 4 | 379 |
| SV0650093R5RU | T | H | Liver—centrilobular, inflammatory cell infiltrate, mixed cell | 37 | 676 | 16 | 264 | 9 | 66 | 2 | 0.6 | 47.0 | 98.3 | 3.9 | 111 | 1 | 131 |
| SV0567153R5RU | T | CP | Total protein increase | 28 | 612 | 14 | 295 | 9 | 66 | 2 | 0.67 | 48.8 | 98.8 | 4.4 | 91 | 2 | 181 |
| SV0635003R5RU | T | CP | Leukocyte count increase | 9 | 183 | 9 | 161 | 9 | 54 | 2 | 0.79 | 47.5 | 99.3 | 4.8 | 34 | 3 | 97 |
| SV0562020R5RU | T | CP | Hemoglobin decrease | 40 | 623 | 19 | 286 | 8 | 67 | 3 | 0.63 | 47.2 | 98.2 | 3.9 | 100 | 1 | 103 |
| SV0643003R5RU | T | BO | Relative liver weight decrease | 13 | 1158 | 11 | 310 | 7 | 68 | 1 | 0.77 | 48.3 | 99.8 | 6.2 | 68 | 10 | 840 |
| SV0562050R5RU | T | CP | Alkaline phosphatase decrease | 15 | 593 | 8 | 287 | 7 | 66 | 1 | 0.6 | 41.7 | 99.3 | 4.6 | 66 | 3 | 156 |
| SV0643002R5RU | T | BO | Relative spleen weight decrease | 21 | 651 | 14 | 294 | 6 | 67 | 9 | 0.97 | 73.3 | 99.9 | 8.1 | 29 | 21 | 918 |
| SV0562014R5RU | T | CP | Mean corpuscular hemoglobin decrease (diagnostic, 3–7D time points) | 14 | 739 | 7 | 310 | 6 | 70 | 4 | 0.86 | 53.3 | 99.8 | 6.5 | 60 | 16 | 1164 |
| SV0562026R5RU | T | CP | Leukocyte count decrease | 33 | 539 | 16 | 276 | 5 | 67 | 3 | 0.73 | 56.1 | 98.7 | 4.5 | 83 | 2 | 177 |
| SV0650033R5RU | T | H | Liver—periportal, hypertrophy | 22 | 699 | 11 | 270 | 5 | 68 | 3 | 0.57 | 46.1 | 98.9 | 4.3 | 71 | 1 | 78 |
| SV0567174R5RU | T | CP | Absolute basophil increase | 16 | 833 | 9 | 303 | 5 | 66 | 5 | 0.88 | 55.0 | 99.8 | 6.6 | 63 | 5 | 390 |
| SV0642001R5RU | T | BO | Relative liver weight increase | 11 | 604 | 9 | 282 | 5 | 63 | 2 | 0.76 | 47.0 | 99.7 | 5.6 | 38 | 11 | 492 |
| SV0651106R5RU | T | H | Liver—diffuse, cytoplasm, eosinophilia | 31 | 1273 | 8 | 277 | 5 | 68 | 27 | 0.84 | 53.1 | 99.7 | 6.0 | 97 | 14 | 1232 |
| SV0575020R5RU | T | CP | Lipase increase | 15 | 563 | 7 | 274 | 5 | 65 | 1 | 0.62 | 40.8 | 99.3 | 4.6 | 57 | 3 | 189 |
| SV0571053R5RU | T | CP | Absolute lymphocyte decrease | 16 | 429 | 11 | 201 | 4 | 55 | 2 | 0.75 | 46.4 | 99.4 | 4.9 | 60 | 2 | 125 |
| SV0650143R5RU | T | H | Liver—periportal, fibrosis | 12 | 754 | 5 | 284 | 4 | 68 | 25 | 0.85 | 71.0 | 99.8 | 7.0 | 42 | 25 | 1380 |
| SV0562116R5RU | T | CP | Glucose decrease | 16 | 754 | 5 | 317 | 3 | 67 | 2 | 0.61 | 42.1 | 99.4 | 4.7 | 69 | 6 | 434 |
| SV0650106R5RU | T | H | Liver—hepatocyte, periportal, lipid accumulation | 13 | 1063 | 4 | 248 | 3 | 66 | 5 | 0.85 | 55.0 | 99.9 | 6.7 | 52 | 16 | 1003 |
| SV0650121R5RU | T | H | Liver—hepatocyte, centrilobular, lipid accumulation, microvesicular | 18 | 1050 | 4 | 248 | 3 | 66 | 3 | 0.6 | 41.3 | 99.5 | 4.9 | 95 | 7 | 627 |
| SV0599196R5RU | P | SAC | GR-MR agonist | 13 | 655 | 7 | 318 | 1 | 67 | 8 | 0.9 | 71.2 | 99.8 | 7.0 | 34 | 6 | 203 |
| SV0614125R5RU | T | SAC | Toxicant, DNA alkylator | 15 | 856 | 6 | 325 | 1 | 69 | 6 | 0.89 | 51.9 | 99.9 | 6.8 | 59 | 4 | 216 |
| SV0614137R5RU | P | SAC | Estrogen receptor agonist, steroidal | 13 | 866 | 5 | 328 | 1 | 69 | 7 | 0.9 | 51.2 | 99.9 | 6.9 | 40 | 4 | 137 |
| SV0614148R5RU | P | SAC | PPAR α agonist, fibric acid | 14 | 861 | 5 | 328 | 1 | 69 | 15 | 0.97 | 72.8 | 100.0 | 8.6 | 20 | 19 | 659 |
| SV0599539R5RU | P | SAC | H+/K+ ATPase inhibitor | 17 | 1610 | 4 | 327 | 1 | 68 | 1 | 0.66 | 50.0 | 99.7 | 5.9 | 56 | 1 | 68 |
| SV0614270R5RU | P | SAC | PDE4 inhibitor | 14 | 1629 | 4 | 334 | 1 | 69 | 2 | 0.77 | 58.3 | 99.8 | 6.7 | 60 | 2 | 122 |
| SV0599291R5RU | T | SAC | Toxicant, heavy metal (3, 5 and 7D, other non-metal toxicants in negative class) | 12 | 877 | 3 | 333 | 1 | 70 | 1 | 0.8 | 50.0 | 99.9 | 6.5 | 50 | 6 | 353 |
| SV0614202R5RU | T | SAC | Toxicant, heavy metal (0.25–7D allowed, other toxicants not in negative class) | 12 | 1115 | 3 | 318 | 1 | 68 | 1 | 0.67 | 40.0 | 99.8 | 5.9 | 55 | 1 | 58 |
| SV0614084R5RU | P | SAC | HMG-CoA reductase inhibitors | 6 | 662 | 3 | 322 | 1 | 68 | 8 | 0.99 | 87.5 | 100.0 | 10.1 | 7 | 15 | 240 |
| Averages | 19.9 | 794 | 10 | 290 | 5.6 | 66.4 | 5.3 | 0.75 | 52.9 | 99.4 | 5.7 | 66.7 | 6.5 | 381 | |||
Class size and class composition (columns 5–10); classification performance and signature length (columns 11–16) and results of necessary gene set definition algorithm (columns 17 and 18).
Notes: The unique signature ID relates each signature to one of the 180 derivation rules, Figure 2. The use column indicates the toxicity-type signatures (T) or the pharmacology-type signatures (P) as discussed in the Materials and methods. There are four types of signatures presented, body and organ weight (BO), histopathology (H), clinical pathology (CP) which includes blood chemistry and blood hematology end points and structure activity class (SAC), which includes signatures of the SAC type and AC type as described in the Materials and methods.
Figure 3Performance of signatures derived from gene sets of increasing size. Red curve, squares: gene sets of increasing size were chosen based on the sum of total impacts across all 34 signatures. Blue curve, diamonds: average of three random choices of gene sets of increasing size; performance was evaluated on all 34 signature training sets. Average and standard deviation of the results are reported. Black curve, triangles: gene sets were chosen based on the sum of total impacts across 31 randomly chosen signatures from the 34. The performance of the gene sets was evaluated on the three left out signatures. The procedure was repeated 10 times and the average and standard deviation of the result are reported. Black dotted line indicates the average LOR obtained when all 8565 probes are used without any pre-selection. See Supplementary information S12 for an equivalent study including permuted label sets.
Figure 4Analysis of gene types enriched in the ranked list of 1660 genes appearing in 34 unique signatures. Genes were ranked by impact across all signatures. Overlapping windows of 400 genes were tested for enrichment in GO terms. Terms with enrichment exceeding −log(P-value)=2 are shown.
Figure 5(A) Analysis of gene types enriched in the necessary gene set for the fibrosis signatures. Genes were ranked by cycle across all signatures. Overlapping windows of 50 genes were tested for enrichment in GO terms. Terms with enrichment exceeding −log(P-value)=2.5 are shown. (B) Model of gene expression events in liver fibrosis annotated with genes observed to be regulated by profibrotic xenobiotic treatments. The normal architecture of the liver (left) is remodeled following toxic exposure to compounds and drugs. Injury to hepatocytes from metabolites and reactive oxygen species activates resident macrophages, Kupffer cells, which produce acute phase response proteins, chemokines and cytokines. These proteins in turn attract other immune cells such as T cells, and activate quiescent hepatic stellate cells (HSCs). Both HSCs and endothelial cells (ECs) respond by proliferating and undergoing subsequent apoptosis. HSCs also produce growth factors that stimulate proliferation as well as collagens and other proteins involved in the deposition of collagen in the extracellular space.