| Literature DB >> 23220237 |
Anagha Joshi1, Rebecca Hannah, Evangelia Diamanti, Berthold Göttgens.
Abstract
Transcription factors are key regulators of both normal and malignant hematopoiesis. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) has become the method of choice to interrogate the genome-wide effect of transcription factors. We have collected and integrated 142 publicly available ChIP-Seq datasets for both normal and leukemic murine blood cell types. In addition, we introduce the new bioinformatic tool Gene Set Control Analysis (GSCA). GSCA predicts likely upstream regulators for lists of genes based on statistical significance of binding event enrichment within the gene loci of a user-supplied gene set. We show that GSCA analysis of lineage-restricted gene sets reveals expected and previously unrecognized candidate upstream regulators. Moreover, application of GSCA to leukemic gene sets allowed us to predict the reactivation of blood stem cell control mechanisms as a likely contributor to LMO2 driven leukemia. It also allowed us to clarify the recent debate on the role of Myc in leukemia stem cell transcriptional programs. As a result, GSCA provides a valuable new addition to analyzing gene sets of interest, complementary to Gene Ontology and Gene Set Enrichment analyses. To facilitate access to the wider research community, we have implemented GSCA as a freely accessible web tool (http://bioinformatics.cscr.cam.ac.uk/GSCA/GSCA.html).Entities:
Mesh:
Substances:
Year: 2012 PMID: 23220237 PMCID: PMC3630327 DOI: 10.1016/j.exphem.2012.11.008
Source DB: PubMed Journal: Exp Hematol ISSN: 0301-472X Impact factor: 3.084
Seventy-eight ChIP-Seq binding peak files covering 53 unique transcription factors in 15 major blood lineages
| Cell type | Transcription factors |
|---|---|
| Lymphocytes | |
| B cells | E2A, Ebf, Foxo1, Oct2, Pax5, Pu.1 |
| T cells | Gata3, Fli1, Pu.1, Stat3, Stat4, Stat5, Stat5a, Stat5b, Stat6, Tbet |
| Thymocytes | Cbfb, Rag2, Ring1b, Runx1 |
| Progenitors | |
| HPC | Gata2, Ldb1, Scl |
| HPC7 | Erg, Fli1, Gata2, Gfi1b, Lmo2, Meis1, Pu.1, Lyl1, Runx1, Scl |
| EML | Runx1, Tcf7 |
| Erythroid progenitors | Gata1, Gata2, Smad1 |
| MK progenitors | Cbfb, Ring1b, Runx1 |
| Myeloid progenitors | Myb |
| Pro B cells | Ebf1, Smad1 |
| Myeloerythroid | |
| MK (megakaryocytes) | Gata1 |
| Macrophages | Cebpα, Cebpβ, P65, Pparg, Pu.1, Stat1 |
| Erythroid | Eto2, Gata1, Ldb1, Mtgr1, Pu.1, Scl |
| Leukemias | |
| Leukemia | Notch1 |
| MLL leukemia | Af9 |
| T cell leukemia | RbpJ |
| T-ALL | Notch1 |
| MEL | Cmyb, Cmyc, Chd2, Gata1, JunD, MafK, Max, Mxi1, NelfE, Scl, Smc3, Tbp, Usf2 |
Figure 1Schematic representation of the Gene Set Control Analysis (GSCA) protocol. For a given gene set of interest (red arrows), the number of peaks in gene loci is determined and a p value is calculated using a hypergeometric test. The TFs from overrepresented ChIP datasets (corrected p < 0.001, yellow bars in the figure) are then reported as candidate upstream transcriptional regulators. (For interpretation of the reference to color in this figure legend, the reader is referred to the web version of this article.)
Figure 2(A) Schematic representation of combinatorial Gene Set Control Analysis (cGSCA). A binary matrix of combinatorial binding patterns is generated using the overrepresented ChIP datasets from GSCA. (B) A hierarchical tree is then generated by clustering similar patterns. Color figure online.
Figure 3(A) Overrepresented regulators determined using GSCA (left) and C-GSCA (right) for gene module 583 from Novershtern et al. [14], with “Late Ery + T/B cells + GRAN” induction pattern. Unlike GSCA, C-GSCA can separate overrepresented independent binding patterns in different cell types (Gata1, Gata2, and Smad1 Erythroid progenitors and Max, Mxi1, and Tbp in MELs in this case). (B) Overrepresented regulators determined using GSCA (left) and C-GSCA (right) for gene module 745 from Novershtern et al. [14] with “NK + T cell” induction pattern. C-GSCA is able to separate combinatorial patterns in T cells and myeloid progenitors.
Figure 4(A) Screen shot of Gene Set Control Analysis (GSCA) web tool with an option to either paste user defined gene list or upload from file, and to select method (GSCA or C-GSCA). (B) GSCA and C-GSCA output for stem signature dataset from Ng et al. [15] showing two cell type–specific distinct combinatorial patterns.
Figure 5(A) Overrepresented regulators determined by C-GSCA for genes down regulated after MLL-AF9 withdrawal from Zuber et al. [40]. C-GSCA supports the notion that AF9 induces an Myb coordinated response. (B) Overrepresented regulators determined by C-GSCA for genes positively correlated with LSC frequency from Somervaille et al. [42]. C-GSCA identified cMyc and several other transcription factors to be overrepresented.
The overlap between tissue specific enhancers identified by Pennacchio et al. [13] and the blood compendium showing that the enhancers in the compendium are highly blood specific
| Tissue type | Number of enhancers | Overlap | |
|---|---|---|---|
| adipose tissue | 213 | 86 | 0.99995 |
| Adrenal gland | 176 | 47 | 1 |
| Amygdala | 218 | 24 | 1 |
| B220+ B cells | 212 | 158 | 1.91E-10 |
| Bladder | 225 | 48 | 1 |
| Blastocysts | 191 | 63 | 1 |
| Bone | 200 | 87 | 0.99795 |
| Bone marrow | 224 | 101 | 0.99466 |
| Brown fat | 224 | 47 | 1 |
| CD4+ T cells | 226 | 148 | 0.000149 |
| CD8+ T cells | 194 | 137 | 7.00E-07 |
| Cerebellum | 180 | 24 | 1 |
| Cerebral cortex | 190 | 29 | 1 |
| Digits | 263 | 56 | 1 |
| Dorsal root ganglia | 193 | 45 | 1 |
| Dorsal striatum | 193 | 30 | 1 |
| Embryo day 10 | 171 | 58 | 1 |
| Embryo day 6 | 167 | 68 | 0.99961 |
| Embryo day 7 | 163 | 51 | 1 |
| Embryo day 8 | 170 | 44 | 1 |
| Embryo day 9 | 174 | 63 | 1 |
| Epidermis | 292 | 58 | 1 |
| Eye | 255 | 32 | 1 |
| Fertilized egg | 176 | 33 | 1 |
| Frontal cortex | 197 | 21 | 1 |
| Heart | 227 | 62 | 1 |
| Hippocampus | 201 | 30 | 1 |
| Hypothalamus | 183 | 25 | 1 |
| Kidney | 230 | 43 | 1 |
| Large intestine | 208 | 62 | 1 |
| Liver | 267 | 27 | 1 |
| Lung | 241 | 75 | 1 |
| Lymph node | 245 | 160 | 0.000102 |
| Mammary gland | 198 | 33 | 1 |
| Med | 192 | 40 | 1 |
| Olfactory bulb | 194 | 32 | 1 |
| Oocyte | 173 | 37 | 1 |
| Ovary | 192 | 46 | 1 |
| Pancreas | 211 | 45 | 1 |
| Pituitary | 187 | 34 | 1 |
| Placenta | 202 | 59 | 1 |
| Preoptic | 176 | 29 | 1 |
| Prostate | 221 | 49 | 1 |
| Salivary gland | 213 | 46 | 1 |
| Skeletal muscle | 224 | 44 | 1 |
| Small intestine | 259 | 65 | 1 |
| Snout epidermis | 275 | 51 | 1 |
| Spinal cord lower | 197 | 39 | 1 |
| Spinal cord upper | 196 | 26 | 1 |
| Spleen | 228 | 108 | 0.97033 |
| Stomach | 206 | 46 | 1 |
| Substantia nigra | 183 | 29 | 1 |
| Testis | 197 | 31 | 1 |
| Thymus | 194 | 111 | 0.15904 |
| Thyroid | 239 | 47 | 1 |
| Tongue | 289 | 51 | 1 |
| Trachea | 250 | 72 | 1 |
| Trigeminal | 193 | 30 | 1 |
| Umbilical cord | 223 | 44 | 1 |
| Uterus | 181 | 58 | 1 |
| Vomeralnasal organ | 252 | 66 | 1 |
Thirty-seven gene sets of 80 with respective induction patterns from Novershtern et al. [14] found enriched using the method of Lachmann et al. [34] and Zambelli et al. [35]
| Novershtern et al. clusters | Candidate upstream regulators | ||
|---|---|---|---|
| # | Induction pattern | Transcription factor | Cell type |
| 583 | Late Ery + T/B cell + GRAN | TCF7 | EML |
| 607 | TCF7 | EML | |
| 649 | B cell | E2A, EBF1, OCT2, PAX5, PU1 | B cells |
| 655 | Mye | LDB1, SCL | HPC |
| 661 | Late Ery + T/B – cell + GRAN | TCF7 | EML |
| 667 | T cell + NK | RUNX1 | EML |
| 673 | T/B cell | E2A, EBF1, OCT2 | B cells |
| 685 | Early Mye + T/B cell + GRAN | RUNX1, TCF7 | EML |
| 703 | T/B cell | RAG2 | Thymocytes |
| 715 | Early Mye + T/B cell + GRAN | RAG2 | Thymocytes |
| 721 | Late MYE + DCs | CEBPA, CEBPB, P65 | Macrophages |
| 727 | Late Ery | ETO2 | Erythroid |
| 733 | HSE + Early Mye | RUNX1, TCF7 | EML |
| 739 | Late Ery + T/B cell + GRAN | TCF7 | EML |
| 763 | Late MYE | EBF1 | B cells |
| 793 | Late Ery + T/B – cell + GRAN | TCF7 | EML |
| 799 | NK + T cells (2) | E2A, FOX01, OCT2, PAX5, PU1 | B cells |
| 811 | Early Mye + T/B cell + GRAN | TCF7 | EML |
| 817 | T/B cell | E2A, EBF1, OCT2, PAX5 | B cells |
| 823 | Early Mye + T/B cell + GRAN | MXI1, NELFE | MEL |
| 835 | Early Mye + T/B – cell + GRAN | GFI1B | Erythroid |
| 841 | Early Mye + T/B cell + GRAN | TCF7 | EML |
| 859 | T cell + NK | RING1B | Thymocytes |
| 871 | HSC + Early MYE | MXI1, NELFE | MEL |
| 883 | Late Ery + T/B cell + GRAN | PU1 | B cells |
| 889 | Late Ery | GATA1, GATA2, SMAD1 | Erythroid progenitors |
| 901 | Early Mye + T/B cell + GRAN | TCF7 | EML |
| 907 | Late Ery + T/B cell + GRAN | TCF7 | EML |
| 925 | Early Mye + T/B cell + GRAN | TCF7 | EML |
| 943 | T/B cell | TCF7 | EML |
| 961 | B cell | E2A, EBF1, OCT2 | B cells |
| 973 | HSE + Early Mye | NELFE | MEL |
| 979 | Late MYE | CEBPA, CEBPB, P65, PPARG, STAT1 | Macrophages |
| 985 | Early Mye + T/B – cell + GRAN | CHD2 | MEL |
| 991 | T/B cell | E2A, OCT2, PAX5, PU1 | B cells |
| 1003 | Late Ery + T/B – cell + GRAN | NELFE | MEL |
| 1021 | Early Mye + T/B cell + GRAN | TCF7 | EML |
Sixty-five gene sets of 80 with respective induction patterns from Novershtern et al. [14] enriched for transcription factor binding regions across multiple blood tissues using GSCA: 63 of 65 show cell type and induction pattern matching
| Novershtern et al. clusters | Combinatorial control signature | ||
|---|---|---|---|
| No. | Induction pattern | Transcription factor | Cell type |
| 399 | None | STAT4, STAT5 | T cells |
| 559 | NK + T cell (2) | STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBET | T cells |
| 571 | Late MYE | CEBPA, CEBPB, P65, PU1, STAT1 | Macrophages |
| 583 | Late ERY + T/B cell + Gran | GATA1, GATA2, SMAD1 | Erythroid progenitors |
| 607 | Early MYE + T/B cell + Gran | PU1 | B cells |
| 613 | T/B – cell | PU1 | B cells |
| 619 | Late MYE | CEBPA, CEBPB, PU1, STAT1 | Macrophages |
| 637 | Late Ery | GATA1, GATA2, SMAD1 | Erythroid progenitors |
| 643 | HSE + Early Mye | GATA2 | HPC7 |
| 649 | B cells | E2A, PAX5, PU1 | B cells |
| 655 | Mye | GATA1, GATA2, SMAD1 | Erythroid progenitors |
| 661 | Late Ery + T/B cell + GRAN | PU1 | B cells |
| 667 | T cell + NK | GATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBET | T cells |
| 673 | T/B cell | E2A, OCT2, PU1 | B cells |
| 679 | HSE + Early Mye | GATA2 | Erythroid progenitors |
| 685 | Late MYE + T/B cell + GRAN | RUNX1, TCF7 | EML |
| 703 | T/B cell | TCF7 | EML |
| 709 | General mild induction | ETO2 | Erythroid |
| 715 | Early MYE + T/B cell + GRAN | PU1 | B cells |
| 721 | Late MYE + DCs | CEBPA, CEBPB, P65, PU1, STAT1 | Macrophages |
| 727 | Late Ery | GATA1, GATA2, SMAD1 | Erythroid progenitors |
| 733 | HSC + Early MYE | CEBPA, CEBPB, P65, PU1, STAT1 | Macrophages |
| 739 | Late ERY + T/B cell + Gran | PU1 | B cells |
| 745 | General mild induction | MYB | Myeloid progenitors |
| 757 | T cell + NK | TCF7 | EML |
| 763 | Late MYE | ERG, FLI1 | HPC7 |
| 769 | T/B cell | PU1 | B cells |
| 775 | Mye | GATA1 | MK cells |
| 781 | General mild induction | NOTCH1 | TALL |
| 787 | MYE | SMAD1 | Erythroid progenitors |
| 793 | Late ERY + T/B cell + Gran | PAX5, PU1 | B cells |
| 799 | NK + T cell (2) | E2A | B cells |
| 805 | HSE + Early Mye | ETO2 | Erythroid |
| 811 | Late MYE + T/B cell + GRAN | TCF7 | EML |
| 817 | T/B cell | E2A, EBF1, PAX5, PU1 | B cells |
| 823 | Early MYE + T/B cell + GRAN | TCF7 | EML |
| 829 | T cell + NK | E2A | B cells |
| 835 | Early MYE + T/B cell + GRAN | PAX5 | B cells |
| 841 | Early MYE + T/B cell + GRAN | PU1 | B cells |
| 847 | Late Ery + T/B cell + GRAN | PAX5 | B cells |
| 853 | Late MYE | PU1 | B cells |
| 859 | T cell + NK | GATA3, STAT3, STAT5, TBET | T cells |
| 865 | HSE + early Mye | GATA2, SMAD1 | Erythroid progenitors |
| 871 | HSE + Early MYE | TCF7 | EML |
| 883 | Late MYE + T/B cell + Gran | TCF7 | EML |
| 889 | Late ERY | GATA1, GATA2, SMAD1 | Erythroid progenitors |
| 895 | Late ERY | GATA2 | Erythroid progenitors |
| 901 | Late MYE + T/B cell + Gran | TCF7 | EML |
| 907 | Late ERY + T/B cell + Gran | PAX5, PU1 | B cells |
| 919 | HSE + Early Mye | FOX01 | B cells |
| 925 | Early MYE + T/B cell + GRAN | PAX5, PU1 | B cells |
| 931 | None | GATA1 | MEL |
| 943 | T/B cell | TCF7 | EML |
| 949 | T/B cell | GATA2 | Erythroid progenitors |
| 955 | T cell + NK | GATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBET | T cells |
| 961 | B cell | E2A, EBF1, OCT2, PAX5, PU1 | B cells |
| 967 | Late ERY + T/B cell + Gran | PAX5 | B cells |
| 973 | HSE + Early Mye | CMYC, GATA1, MAX, MXI1 | MEL |
| 979 | Late MYE | PU1 | B cells |
| 985 | Early MYE + T/B cell + Gran | TCF7 | EML |
| 991 | T/B cell | PU1 | B cells |
| 997 | NK + T cell (2) | STAT4 | T cells |
| 1003 | Late ERY + T/B cell + Gran | TCF7 | EML |
| 1009 | HSE + Early Mye | CEBPA, CEBPB, P65, PU1, STAT1 | Macrophages |
| 1021 | Early MYE + T/B cell + GRAN | PAX5 | B cells |