Literature DB >> 23220237

Gene set control analysis predicts hematopoietic control mechanisms from genome-wide transcription factor binding data.

Anagha Joshi1, Rebecca Hannah, Evangelia Diamanti, Berthold Göttgens.   

Abstract

Transcription factors are key regulators of both normal and malignant hematopoiesis. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) has become the method of choice to interrogate the genome-wide effect of transcription factors. We have collected and integrated 142 publicly available ChIP-Seq datasets for both normal and leukemic murine blood cell types. In addition, we introduce the new bioinformatic tool Gene Set Control Analysis (GSCA). GSCA predicts likely upstream regulators for lists of genes based on statistical significance of binding event enrichment within the gene loci of a user-supplied gene set. We show that GSCA analysis of lineage-restricted gene sets reveals expected and previously unrecognized candidate upstream regulators. Moreover, application of GSCA to leukemic gene sets allowed us to predict the reactivation of blood stem cell control mechanisms as a likely contributor to LMO2 driven leukemia. It also allowed us to clarify the recent debate on the role of Myc in leukemia stem cell transcriptional programs. As a result, GSCA provides a valuable new addition to analyzing gene sets of interest, complementary to Gene Ontology and Gene Set Enrichment analyses. To facilitate access to the wider research community, we have implemented GSCA as a freely accessible web tool (http://bioinformatics.cscr.cam.ac.uk/GSCA/GSCA.html).
Copyright © 2013 ISEH - Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23220237      PMCID: PMC3630327          DOI: 10.1016/j.exphem.2012.11.008

Source DB:  PubMed          Journal:  Exp Hematol        ISSN: 0301-472X            Impact factor:   3.084


Cell type–specific gene expression is an inherent property of all multicellular organisms and indeed represents a major determinant that underlies the generation of differentiated cell types with distinct functionality. Elucidating the molecular mechanisms controlling cell type–specific expression has the power to reveal fundamental insights into the regulatory circuitry controlling both human and model organism development. Moreover, identification of control mechanisms in normal cells provides potential avenues for manipulating cellular fates, as exemplified by the recent explosion in cellular reprogramming studies [1]. It also enables the rational design of new therapies aiming to revert abnormal pathological cellular states back to their normal condition [1]. The blood or hematopoietic system has long been recognized as a powerful model system for studying cell type–specific gene expression [2]. Within the blood system, more than 10 distinct mature hematopoietic lineages (e.g., red blood cells, T cells, B cells) are generated from pluripotent hematopoietic stem cells (HSCs) via a sequence of intermediate progenitors, often represented as a lineage differentiation tree. Both the mature lineages as well as the various immature blood stem and progenitor populations can be purified based on the expression of combinations of specific cell surface markers, thus enabling powerful studies of cellular differentiation. Transcription factors have long been recognized as major regulators of hematopoietic cell type specification [3-6]. To understand the mechanisms underlying cell type specification by transcription factors, it will be essential to identify their transcriptional targets. An important advancement in this research area was provided by the introduction of chromatin immunoprecipitation (ChIP) coupled to massively parallel sequencing (ChIP-Seq), which allows genome scale identification of all DNA sequences (regions) bound by a given transcription factor (TF) in a given cell type [7]. The technique has been rapidly adopted with over 100 individual studies now deposited in public databases for the murine hematopoietic system alone. This wealth of new data represents unprecedented opportunities to unravel the transcriptional control mechanisms that mediate expression of specific sets of genes within the various hematopoietic cell lineages [8]. Gene ontology [9] overrepresentation analysis provides information on various types of functional categories enriched within a given gene set of interest [10] and GSEA determines whether a gene set of interest shows statistically significant expression differences between two or more cell types [11]. However, neither of these approaches explicitly links a gene set to transcriptional control mechanisms. In this study, we report a new computational framework for linking gene sets with transcriptional control, called Gene Set Control Analysis (GSCA). Unlike previous algorithms developed to provide functional enrichment [10], GSCA links gene sets to likely upstream regulators responsible for coordinated expression. By exploiting multiple transcription factor binding patterns from genome-wide ChIP-Seq studies, GSCA can provide previously unattainable insights into possible transcriptional control mechanisms operating in both normal and malignant cells. To gain insights into combinatorial control mechanisms (i.e. multiple transcription factors occupying the same binding site in a gene locus), we further developed a novel tool called combinatorial-GSCA (C-GSCA). Through integrated analysis of 142 blood-specific ChIP-Seq binding datasets, C-GSCA identifies likely combinatorial transcriptional control mechanisms by revealing TF cooccupancy patterns specifically associated with gene regulatory elements from a given gene set. A web-based implementation of GSCA and C-GSCA allows user-friendly access for the wider research community, and thus provides a substantial new addition to the bioinformatic toolbox for hematopoietic gene set analysis.

Methods

ChIP-seq compendium

Binding events for 35 transcription factors in seven major hematopoietic lineages were obtained from Hannah et al. [8]. Sixty new ChIP datasets from 18 publications and ENCODE murine datasets were analyzed, starting from the raw data set in each case, and peaks were identified in each sample using the protocol described previously [8]. A supplementary website (http://bioinformatics.cscr.cam.ac.uk/BLOOD_compendium_PUBLISHED.html) lists the number of peaks, reference, and peak calling method for each of the ChIP dataset. All binding events were mapped to genes using the same protocol described previously [12]. Binding events in the promoter and gene body were associated to the corresponding gene, whereas intergene peaks were associated to the nearest gene on either side within 50 kb, such that each peak is assigned to at most two genes. Tissue-specific enhancer elements in mouse were downloaded from [13] and p value was calculated for overlap between each of the 61 tissue-specific enhancer regions and blood-specific regulatory regions [8] using a hyper-geometric test (Supplementary Table 1, online only, available at www.exphem).

GSCA method

Of 270,261 genomic regions bound by at least one TF (N), for a set of user-defined genes, we calculate the number of genomic regions mapped to the genes (n). For each ChIP-Seq ChIP dataset, the number of peaks (m) near user defined genes (k) is calculated. The p value is calculated using a hypergeometric test (Fischer exact test).

cGSCA method

A matrix of binding events with 270,261 genomic regions as rows and overrepresented ChIP-seq data sets (K) from GSCA step as columns is generated. The ChIP-seq data sets (K columns) are then clustered using a hierarchic clustering with Pearson's correlation coefficient as a distance measure.

Reference data set

Gene sets for 80 clusters of tightly coexpressed genes (their induction patterns) in 38 hematopoietic cell types were obtained from Novershtern et al. [14]. Human genes were mapped to orthologous mouse genes using MGI mammalian orthology (http://www.informatics.jax.org/orthology.shtml). We calculated the p value for each gene set with respect to each signature cluster using a hypergeometric test. We used the number of Novershtern clusters significantly overrepresented (Bonferroni corrected p < 0.001) for one or more transcription factor targets as a measure to evaluate performance while comparing different methods.

Gene expression datasets

Nine gene expression signatures (d-erythroid, differentiated, d-lymphoid, d-myeloid, r-myelolymphoid, s-erythroid, s-mpp, s-myelolymphoid, and stem) were obtained from [15]. Differentially expressed genes in various leukemia datasets were downloaded from their respective publications. Gene lists were then interrogated against the ChIP-seq compendium using both GSCA and C-GSCA.

GSCA web tool

The GSCA output was produced using R, and the web user interface of the application was done using Perl/CGI/HTML. R commands are executed through the perl–cgi script to produce the image. The web tool can be accessed at the following URL: http://bioinformatics.cscr.cam.ac.uk/GSCA/GSCA.html.

Results

Definition of a candidate regulatory genome in mouse hematopoiesis

We recently reported a compendium of more than 50 TF ChIP-Seq experiments in mouse blood cells collected from publicly available datasets [8]. We have doubled the compendium by adding 60 new ChIP datasets from 18 recently published studies [16-33] and ENCODE murine unpublished datasets to obtain genome-wide binding patterns for 53 unique transcription factors in 15 major blood lineages and three types of leukemia (Table 1). TF-bound peaks were determined for all new datasets using the same parameters as before [8], which resulted in a total of 270,261 genomic regions bound by at least one transcription factor. When added together, these 270,261 regions corresponded to 936 Mb, thus constituting 5.78% of the mouse genome. ChIP-Seq samples of the same transcription factor in related cell types were merged together to provide a consolidated set of 78 samples (Table 1).
Table 1

Seventy-eight ChIP-Seq binding peak files covering 53 unique transcription factors in 15 major blood lineages

Cell typeTranscription factors
Lymphocytes
 B cellsE2A, Ebf, Foxo1, Oct2, Pax5, Pu.1
 T cellsGata3, Fli1, Pu.1, Stat3, Stat4, Stat5, Stat5a, Stat5b, Stat6, Tbet
 ThymocytesCbfb, Rag2, Ring1b, Runx1
Progenitors
 HPCGata2, Ldb1, Scl
 HPC7Erg, Fli1, Gata2, Gfi1b, Lmo2, Meis1, Pu.1, Lyl1, Runx1, Scl
 EMLRunx1, Tcf7
 Erythroid progenitorsGata1, Gata2, Smad1
 MK progenitorsCbfb, Ring1b, Runx1
 Myeloid progenitorsMyb
 Pro B cellsEbf1, Smad1
Myeloerythroid
 MK (megakaryocytes)Gata1
 MacrophagesCebpα, Cebpβ, P65, Pparg, Pu.1, Stat1
 ErythroidEto2, Gata1, Ldb1, Mtgr1, Pu.1, Scl
Leukemias
 LeukemiaNotch1
 MLL leukemiaAf9
 T cell leukemiaRbpJ
 T-ALLNotch1
 MELCmyb, Cmyc, Chd2, Gata1, JunD, MafK, Max, Mxi1, NelfE, Scl, Smc3, Tbp, Usf2
Pennacchio et al. [13] developed a phylogenetic conservation and motif based approach to predict tissue specific enhancers, which allowed them to annotate ∼5,500 high-confidence mouse tissue-specific enhancers for 61 murine tissue types by integrating tissue-specific expression data, conservation information, and cis-regulatory motifs. Only 4 of these 61 tissues corresponded to hematopoietic cells, and predicted only enhancers for those four tissues showed significant overlap with our ChIP-enriched regions (B220+ B cells, p = 1.9e-10; CD4+ T cells, p = 1.4e-4; CD8+ T cells, p = 7.0e-7; lymph node, p = 1.0e-4; see Supplementary table 1). This analysis therefore supports the validity of a compendium built on TF binding events in hematopoietic cells.

A new GSCA tool matches weighted TF-peak lists to gene sets

We next explored whether our blood-specific TF ChIP-Seq peak catalogue could be used to predict transcriptional control mechanisms that may regulate the coordinated expression of a given set of genes. Computational tools for the identification of statistically significant overlaps between a given gene set and peak regions from single ChIP-Seq experiments have been described previously [34,35]. However, these tools do not exploit the ever-increasing number of datasets for multiple TFs in the same or related cell types. Novershtern et al. [14] reported gene expression profiles in 38 distinct purified populations of human hematopoietic cells ranging from hematopoietic stem cells, through multiple progenitor and intermediate maturation states, to 12 terminally differentiated cell types. Using the Module Networks algorithm [36], they identified 80 modules or gene sets of tightly coexpressed genes with distinct expression patterns and enrichment for specific biological functions, which they termed induction patterns. When we used the 80 Novershtern modules as gene sets, 37 of 80 gene sets (Supplementary Table 2, online only, available at www.exphem) showed a statistically significant correlation with one or more TF peak files from our compendium when using the previously described ChIP Enrichment Analysis (ChEA) [34] and Csan [35] tools. Of note, there was a good overlap between the cell type used for ChIP-Seq and the expression/induction patterns as annotated by Novershtern et al. (Supplementary Table 2). For example, gene set 727 with induction pattern “Late Erythroid” was associated with Eto2 in Erythroid, Scl, and Ldb1 in HSCs and Scl in MELs, and gene set 979 overrepresented for “immune response” genes with induction pattern “Late MYE” was associated with Cebpα, Cebpβ, P65, Pparg, and Stat1 in macrophages. Because the ChEA [34] and Csan [35] tools could associate candidate upstream regulators to less than half of the 80 Novershtern gene sets, we set out to develop an alternative approach by incorporating the concept of weighted peak-to-gene mapping recently reported as part of the Genomic Regions Enrichment of Annotations Tool (GREAT) [37]. GREAT links a list of ChIP-Seq peak regions to gene lists with particular functional significance and unlike previous approaches incorporates binding sites not only in the promoter region of a gene. Taking inspiration from this approach, we developed a new tool by mapping each peak to its nearest gene within 50 kb and then considering the number of binding events in each gene locus to calculate the significance of association between a gene locus and a given upstream regulator. (Essentially this is the reverse of GREAT, which associates peaks with genes, whereas our new procedure associates genes with peaks). Specifically, our new tool determines the number of binding events in the loci of genes of interest for each ChIP dataset (Fig. 1A, red arrows), and then calculates a p value using a simple hypergeometric test. Datasets with statistically significant overlaps (corrected p value cut-off <0.001) are then selected by interrogating all ChIP datasets independently against the gene list (Fig. 1B). When applied to the 80 gene modules from Novershtern et al. [14], our new tool reported significant associations with ChIP-Seq peaks for 65 gene modules (Supplementary Table 2, online only, available at www.exphem), which corresponds to 81% of all gene sets compared with only 46% using the previously reported ChEA and Cscan tools. Incorporation of weighted gene lists therefore results in a significant increase in the percentage of gene modules that can be linked to candidate upstream regulators. We named this new approach Gene Set Control Analysis, or GSCA. Only 61% of all Novershtern gene sets (49 of 80 gene modules) were enriched when the binding events only in promoters were selected, thus highlighting the likely importance of binding to nonpromoter regions, which compose 57% of all binding events in our datasets.
Figure 1

Schematic representation of the Gene Set Control Analysis (GSCA) protocol. For a given gene set of interest (red arrows), the number of peaks in gene loci is determined and a p value is calculated using a hypergeometric test. The TFs from overrepresented ChIP datasets (corrected p < 0.001, yellow bars in the figure) are then reported as candidate upstream transcriptional regulators. (For interpretation of the reference to color in this figure legend, the reader is referred to the web version of this article.)

GSCA correlates relevant combinations of transcription factors with hematopoietic gene sets

To investigate the potential biological relevance of the candidate upstream regulatory transcription factors matched with the 65 Novershtern gene sets by GSCA, we again used the induction patterns defined by Novershtern et al. as a measure of lineage-specific expression. The majority of gene sets (97%) showed good correspondence between the induction patterns and the cell types in which the TFs had been chipped (Supplementary Table 3, online only, available at www.exphem). For example, gene sets 667 and 829 (enriched for T cell receptor activity) were associated by GSCA with Stats and Gata3 in T cells, whereas gene sets 649 and 961 (enriched for B cell receptor activity) were associated with Pu.1, E2A, and Pax5 in B cells. Gene set 721 (involved in inflammatory and antibacterial response) was linked by GSCA with Cebpα, Cebpβ, P65, Pu.1, and Stat1 in macrophages. Gene sets 727 and 889 with Late Ery induction pattern (enriched for protein amino acid glycosylation and blood group antigen functional annotations) significantly overlapped only with targets of Eto2, Gata2, Ldb1, Mtgr1, and Scl in erythroid cells. Taken together therefore, there is good concordance between the induction patterns of Novershtern gene sets and the matching ChIP-Sequencing TF datasets identified by GSCA.

Combinatorial regulatory pattern discovery from multi factor ChIP-Seq data

Compared with previous tools, our new GSCA tool performs better by associating gene lists with ChIP-Seq peaks by calculating weighted associations between factors and genes based on the number of binding events within a gene locus. However, all individual ChIP-Seq datasets are treated independently, thus making it difficult to infer whether two overrepresented transcription factors work combinatorially (e.g. whether they show statistically significant co-occupancy of the same regulatory regions), rather than binding to overlapping sets of gene loci, but using distinct cis-regulatory regions. To address this issue of combinatorial binding, we developed a new tool called combinatorial GSCA (C-GSCA), and then applied this new tool to our hematopoietic ChIP-Seq compendium. For a given gene list, we first run GSCA to select the TFs showing overrepresented binding. Assuming that m TFs are selected out of 78 ChIP-seq datasets, we generate a binary matrix (n × m) of m columns representing the m ChIP datasets and n rows representing the genomic regions occupied by two or more of the m TFs, with 1s and 0s indicating the presence or absence of binding, respectively. We filter genomic regions bound by only one factor (∼16% of genomic regions; Supplementary Table 2, online only, available at www.exphem) because they are not informative in terms of combinatorial control mechanisms. We then perform hierarchical clustering of n overrepresented ChIP datasets using Pearson's correlation coefficient as a distance measure. Unlike GSCA, all overrepresented ChIP datasets are considered together, making the prediction of combinatorial control feasible (Fig. 2).
Figure 2

(A) Schematic representation of combinatorial Gene Set Control Analysis (cGSCA). A binary matrix of combinatorial binding patterns is generated using the overrepresented ChIP datasets from GSCA. (B) A hierarchical tree is then generated by clustering similar patterns. Color figure online.

Using ChIP-Seq analysis of 10 transcription factors in the hematopoietic progenitor cell line HPC7, we have shown previously that combinatorial interactions between a heptad of TFs (SCL, LYL1, LMO2, GATA2, RUNX1, ERG, and FLI-1) were overrepresented in the loci of genes specifically expressed in HSPCs and therefore associated with gene sets specifically expressed in HSCs [12]. When the heptad-bound genes were interrogated using GSCA, 49 of 78 ChIP-Seq datasets were enriched, thus identifying multiple new transcription factors as candidate upstream regulators in addition to the seven factors (Supplementary Figure 1, online only, available at www.exphem). Using C-GSCA, these 49 datasets could be split into four cell type–specific groups of T cells, macrophages, HSCs, and erythroid (Supplementary Figure 1, online only, available at www.exphem). This observation suggests that gene loci bound by the heptad in blood stem and progenitor cells not only include genes specifically expressed in HSCs, but could also include a subset of genes affiliated with various different hematopoietic differentiation programs—an observation that would be consistent with the concept of lineage priming developed in the 1990s [38]. These results suggested that the C-GSCA procedure outlined here may be useful more generally to associate hematopoietic gene sets to upstream regulators and thus able to predict combinatorial control mechanisms driving the expression of a given gene set. We next applied the new C-GSCA tool to all 80 hematopoietic gene sets from the Novershtern et al. study [14], which allowed us to associate 65 of the 80 Novershtern gene sets overrepresented for ChIP datasets using GSCA for combinatorial TF signatures. For example, Novershtern gene set 583 with induction pattern “Late Ery + T/B cell + GRAN” is associated with entirely different sets of transcription factors in two different cell types, because it was linked with Gata1, Gata2, Scl, and Smad1 in erythroid progenitors, and Rag2 in thymocytes, Max, Mxi1, and Tbp in mouse erythroleukemia (MEL) (Fig. 3A). Similarly, gene set 745 with induction pattern “NK + T cell” is linked with Myb in myeloid progenitors and Stat3, Stat4, and Stat5 in T cells (Fig. 3B). Indeed, more than 60% (40 of 65) of the overrepresented Novershtern gene sets with matched upstream regulators were linked with more than one combinatorial pattern (Supplementary Table 3, online only, available at www.exphem). Therefore, unlike the GSCA approach (Fig. 1), C-GSCA has the potential to identify distinct subsets of candidate upstream regulators for a given gene set (Fig. 2).
Figure 3

(A) Overrepresented regulators determined using GSCA (left) and C-GSCA (right) for gene module 583 from Novershtern et al. [14], with “Late Ery + T/B cells + GRAN” induction pattern. Unlike GSCA, C-GSCA can separate overrepresented independent binding patterns in different cell types (Gata1, Gata2, and Smad1 Erythroid progenitors and Max, Mxi1, and Tbp in MELs in this case). (B) Overrepresented regulators determined using GSCA (left) and C-GSCA (right) for gene module 745 from Novershtern et al. [14] with “NK + T cell” induction pattern. C-GSCA is able to separate combinatorial patterns in T cells and myeloid progenitors.

As GSCA and C-GSCA provide potentially powerful ways of predicting candidate upstream regulators for a given list, we developed a web tool to facilitate gene set control analysis for the wider community (http://bioinformatics.cscr.cam.ac.uk/GSCA/GSCA.html). In this section we provide a brief explanation of the functionality of the GSCA web tool using a recent transcriptome analysis of murine HSCs and early multipotent, bipotent, and unipotent progenitors [15], which reported nine gene expression signatures ranging from those characteristic for the most immature HSCs to those affiliated with differentiation into the individual hematopoietic lineages. We interrogated these nine experimentally obtained gene expression signatures using the GSCA web tool. Eight of these nine mouse stem–progenitor gene signatures showed significant overlap with multiple ChIP-Seq data sets, thus providing an independent test case to examine the biological relevance of predicted combinatorial regulatory signatures in addition to testing the functionality of the web tool (Supplementary Figure 3, online only, available at www.exphem). Figure 4A shows a screenshot of the web tool in which users can paste a query gene list or upload it from a file (human or mouse).
Figure 4

(A) Screen shot of Gene Set Control Analysis (GSCA) web tool with an option to either paste user defined gene list or upload from file, and to select method (GSCA or C-GSCA). (B) GSCA and C-GSCA output for stem signature dataset from Ng et al. [15] showing two cell type–specific distinct combinatorial patterns.

Upon choosing GSCA, a gene list of interest is interrogated against 78 ChIP-Seq datasets across 15 blood cell types. GSCA calculates the significance of overlap between each ChIP-Seq dataset and the gene set of interest and displays all ChIP-Seq datasets, with those showing enrichment in yellow color. For example, the self-renewing signature (stem signature from Ng et al. [15]) is provided as a test dataset for the users and shows statistically significant overlap with multiple transcription factors in HPC7 and progenitors. When the same stem signature gene list is analyzed using C-GSCA, the overrepresented ChIP datasets are clustered into two distinct cell type specific clusters HPC7 and MK progenitors (Fig. 4B). Six of the seven transcription factors in the HPC7 cluster overlap with the heptad signature—a binding pattern that we have previously shown is overrepresented in the loci of genes specifically expressed in HSPCs and therefore associated with gene sets specifically expressed in HSCs [12]. Similarly, the gene signature associated with the third wave of the myeloid lineage program (d-my signatures) from Ng et al. [15] shows statistically significant overlap with two combinatorial binding events, Cebpα, Cebpβ, Stat1, P65, and Pu.1 in macrophages and Myb in myeloid progenitors. In addition to showing the functionality of the web tool, these results suggest that combinatorial control signatures generated by C-GSCA have the potential to provide insights into combinatorial transcriptional control mechanisms, and that the GSCA web tool provides access to this type of analysis to the wider community.

GSCA analysis of gene sets associated with hematologic malignancies

We have shown that GSCA can be used to link lineage-specific gene sets to combinations of candidate upstream regulatory TFs, and these associations are consistent with expectations based on current knowledge of regulatory control within hematopoiesis. This consistency attests to the potential robustness of the GSCA approach and suggests that it may also be useful to reveal biological insights into transcriptional programs operating in malignant hematopoietic cells, where diagnostic or prognostic gene sets have been derived for many types of leukemia, yet the combinations of TFs driving expression of these gene sets remain largely unknown. We therefore explored the utility of GSCA for linking leukemic gene sets with candidate upstream regulators. We first analyzed a gene set recently reported by McCormack et al. [39], in which the investigators showed that overexpression of Lmo2 in T-lymphoid progenitors induced a preleukemic state characterized by extensive self-renewal capacity. When the authors performed comparative gene expression profiling of normal and LMO2 expressing thymocytes, they noted upregulation of several HSC specific genes and suggested that ectopic expression of Lmo2 might activate an HSC specific transcription program. To test this hypothesis further, we analyzed the list of genes upregulated in Lmo2 transgenic DN thymocytes [39] by GSCA. This analysis suggested that the LMO2 overexpression gene set was under the transcriptional control of stem cell transcription factors such as Scl, Gata2, Runx1, Fli1 and Erg and also showed a strong overlap with LMO2 binding itself in non-leukemic progenitor cells. We next analyzed gene expression profiling data generated as part of a recent study investigating transcriptional programs downstream of mixed lineage leukemia (MLL) transformation in mouse models of acute myeloid leukemia (AML) [40]. Expression analyses following MLL-AF9 withdrawal had prompted the authors to propose a model whereby MLL-AF9 enforces a Myb-coordinated program of aberrant self-renewal that involves genes linked to leukemia stem cell potential and poor prognosis in human AML patients. Of note, when we analyzed the genes downregulated following MLL-AF9 withdrawal by GSCA, we observed statistically significant overlaps with the two Myb ChIP-Seq datasets in our compendium (Fig. 5A). In addition, GSCA also recovered associations with MAX and the MAX interacting protein MXI1, both of which have also been linked to a range of human cancers [41]. GSCA analysis therefore not only corroborated the findings by Zuber et al. [40]; it also provided additional hypotheses on likely mechanisms that might control transcriptional programs downstream of MLL-AF9 in AML.
Figure 5

(A) Overrepresented regulators determined by C-GSCA for genes down regulated after MLL-AF9 withdrawal from Zuber et al. [40]. C-GSCA supports the notion that AF9 induces an Myb coordinated response. (B) Overrepresented regulators determined by C-GSCA for genes positively correlated with LSC frequency from Somervaille et al. [42]. C-GSCA identified cMyc and several other transcription factors to be overrepresented.

The final leukemic gene set analyzed by GSCA was taken from a 2009 study of the transcriptional programs in leukemic stem cells [42]. Comprehensive gene expression profiling analysis had lead the authors to speculate that leukemia stem cells in an MLL-driven mouse model of AML are characterized by a transcriptional program shared with embryonic rather than adult stem cells. This conclusion was subsequently challenged when it was suggested that the overlap with embryonic stem cell transcriptional programs was the reflection for a shared dependence on c-MYC activity rather than related to the stemness phenotype of ES cells [43]. Analysis of the leukemia stem cell associated gene set from the Somervaille et al. [42] study by GSCA revealed a strong association with c-MYC ChIP-Seq datasets (Fig. 5B). However, there were also statistically significant associations with many additional ChIP-Seq datasets. GSCA analysis was therefore supportive of a role for c-MYC in the similarity between leukemic and embryonic stem cell expression signatures, but suggested that TFs more specifically expressed within blood cells also make important contributions to the leukemia stem cell transcriptional program. Of note, genes negatively associated with the leukemia stem cell phenotype in the study by Somervaille et al. [42] did not show the overlap with c-MYC, but it showed a distinct pattern of correlated ChIP-Seq datasets for the hematopoietic TFS, which interestingly contained several datasets for mature macrophages and was thus consistent with a relatively immature differentiation stage for the leukemia stem cells (Supplementary Figure 2, online only, available at www.exphem). The application of GSCA to leukemic expression datasets supports the notion that integrated analysis of genome-wide transcription factor binding maps has significant potential as a new addition to the toolbox used by experimentalists to derive new hypothesis for experimental validation, which in the case of our current implementation of GSCA analysis would be geared specifically toward the identification of transcriptional mechanisms that control the behavior of normal and leukemic blood cells.

Discussion

Gene expression arrays have been used widely to characterize genes responsible for a particular cellular phenotype. The differentially expressed genes thus obtained can then be used for functional enrichment analysis. However, the important question of “What upstream regulatory mechanisms are responsible for the differential expression?” is not specifically addressed when using current approaches for gene set analysis, such as Gene Ontology or Gene Se Enrichment analysis tools. As a result of the rapid progress in next-generation sequencing technology, ChIP-Seq analysis has become a favorite tool to investigate in vivo binding events because it offers higher resolution, less noise, and greater coverage compared with other techniques [44]. Nevertheless, the generation of genome-wide binding maps for multiple transcription factors across different cell types remains a formidable challenge for individual labs [45]. ChIP-Seq datasets from different labs can, however, be integrated at the computational level, which we recently demonstrated using 53 mouse ChIP-Seq experiments from different laboratories across the hematopoietic differentiation tree [8]. Since then, we have added 60 new ChIP datasets, thus more than doubling the size of the original compendium. In addition to highlighting a potentially major portion of the total regulatory genome involved in hematopoietic gene expression, a data compendium of this scale should have the potential to provide new insights into regulatory mechanisms governing gene sets of interest. To explore this further, we developed GSCA to identify enriched combinatorial binding patterns of transcription factors regulating a given gene set. This method uses experimental binding evidence, keeping the cell type specific context, unlike prediction methods based on overrepresentation of cis-regulatory sequence motifs in the promoters [46]. Using 80 clusters of tightly coexpressed genes in 38 hematopoietic cell types [14], we demonstrated that the transcriptional control mechanisms predicted are biologically coherent, and that GSCA performs better than current methods. Of note, this analysis also demonstrated that GSCA can be used in a cross-species fashion, with human gene sets analyzed using a murine ChIP-Seq compendium in this particular instance. The rationale for this cross-species capability is provided by recent observations from ChIP-Seq data for the same transcription factor in multiple species where it was shown that, although a significant proportion of binding locations (peaks) are not conserved, there tends to be what was termed binding site turnover for these sites where loss of binding in one species is accompanied by gains elsewhere in the same gene locus in the other species [47]. The conserved and many of the nonconserved binding sites therefore map to the same gene loci, such as in human–mouse comparisons. Just as for many other gene set analysis tools, cross-species capability in GSCA is facilitated by the use of standard gene symbols that are standardized across mammals. We further illustrated the utility of the GSCA tool to unravel potential regulatory mechanisms underlying a range of leukemia gene sets, thus suggesting potential future application of GSCA to build hypotheses to investigate transcriptional control mechanisms responsible for the expression of gene sets with diagnostic, prognostic, or therapeutic relevance. Finally, we built a web tool to facilitate similar analysis for the wider scientific community. Complementary to gene ontology functional overrepresentation analysis, GSCA calculates overrepresentation of binding events for a gene list of interest, thus predicting possible transcriptional control mechanisms. Given the significant investment into several collaborative projects such as the ENCODE (Encyclopaedia of DNA Elements) and modENCODE (model organism ENCODE) initiatives [48,49], we are likely to witness a near exponential increase in ChIP-Seq datasets over the coming years. Although our current implementation of the GSCA web tool is geared toward predicting candidate upstream regulators within hematopoietic cells, the approach can be applied easily to other tissues when sufficient ChIP-Seq data become available.
Supplementary Table 1

The overlap between tissue specific enhancers identified by Pennacchio et al. [13] and the blood compendium showing that the enhancers in the compendium are highly blood specific

Tissue typeNumber of enhancersOverlapp value
adipose tissue213860.99995
Adrenal gland176471
Amygdala218241
B220+ B cells2121581.91E-10
Bladder225481
Blastocysts191631
Bone200870.99795
Bone marrow2241010.99466
Brown fat224471
CD4+ T cells2261480.000149
CD8+ T cells1941377.00E-07
Cerebellum180241
Cerebral cortex190291
Digits263561
Dorsal root ganglia193451
Dorsal striatum193301
Embryo day 10171581
Embryo day 6167680.99961
Embryo day 7163511
Embryo day 8170441
Embryo day 9174631
Epidermis292581
Eye255321
Fertilized egg176331
Frontal cortex197211
Heart227621
Hippocampus201301
Hypothalamus183251
Kidney230431
Large intestine208621
Liver267271
Lung241751
Lymph node2451600.000102
Mammary gland198331
Med192401
Olfactory bulb194321
Oocyte173371
Ovary192461
Pancreas211451
Pituitary187341
Placenta202591
Preoptic176291
Prostate221491
Salivary gland213461
Skeletal muscle224441
Small intestine259651
Snout epidermis275511
Spinal cord lower197391
Spinal cord upper196261
Spleen2281080.97033
Stomach206461
Substantia nigra183291
Testis197311
Thymus1941110.15904
Thyroid239471
Tongue289511
Trachea250721
Trigeminal193301
Umbilical cord223441
Uterus181581
Vomeralnasal organ252661
Supplementary Table 2

Thirty-seven gene sets of 80 with respective induction patterns from Novershtern et al. [14] found enriched using the method of Lachmann et al. [34] and Zambelli et al. [35]

Novershtern et al. clusters
Candidate upstream regulators
#Induction patternTranscription factorCell type
583Late Ery + T/B cell + GRANTCF7GFI1BSCLMAX, MXI1, NELFE, TBPETS1FLI1RAG2EMLErythroidHPCMELMK progenitorsT cellsThymocytes
607TCF7GFI1B, SCLP65, PPARGMXI1, NELFEMYBFLI1RAG2, RING1BEMLHPC7MacrophagesMELMyeloid progenitorsT cellsThymocytes
649B cellE2A, EBF1, OCT2, PAX5, PU1RUNX1GFI1B, LDB1, MTGR1, PU1, SCLSCLFLI1, GATA2, MEIS1, PU1, SCLCEBPA, CEBPB, P65, STAT1CMYB, CHD2, JUND, MAFK, MAX, MXI1, NELFE, SCLGATA1CBFB, RING1B, RUNX1AF9MYBEBF1, SMAD1RBPJFLI1, GATA3, PU1, STAT3, STAT5A, STAT5B, STAT5, TBETCBFB, RAG2, RUNX1B cellsEMLErythroidHPCHPC7MacrophagesMELMK cellsMK progenitorsMLL leukemiaMyeloid progenitorsPro B cellsT cell leukemiaT cellsThymocytes
655MyeLDB1, SCLNELFE, SCLHPCMEL
661Late Ery + T/B – cell + GRANTCF7NELFEFLI1RAG2EMLMELT CellsThymocytes
667T cell + NKRUNX1GATA2, RUNX1RUNX1GATA3, STAT5A, STAT5BCBFB, RUNX1EMLHPCHPC7T CellsThymocytes
673T/B cellE2A, EBF1, OCT2GATA2, RUNX1, SCLCEBPA, CEBPB, P65, PPARGEBF1, SMAD1FLI1, GATA3, PU1, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6CBFB, RING1B, RUNX1B cellsHPC7MacrophagesPro B cellsT cellsThymocytes
685Early Mye + T/B cell + GRANRUNX1, TCF7GFI1B, PU1SCLCEBPA, CEBPB, P65CMYC, MAX, MXI1, NELFE, TBPCBFB, ETS1, RUNX1MYBFLI1, STAT4, STAT5B, STAT6, TBETCBFB, RAG2, RING1B, RUNX1EMLErythroidHPC7MacrophagesMELMK progenitorsMyeloid progenitorsT cellsThymocytes
703T/B cellRAG2Thymocytes
715Early Mye + T/B cell + GRANRAG2Thymocytes
721Late MYE + DCsCEBPA, CEBPB, P65Macrophages
727Late EryETO2LDB1, SCLSCLErythroidHPCMEL
733HSE + Early MyeRUNX1, TCF7GATA1, GATA2GFI1B, MTGR1, SCLGATA2, GFI1B, LMO2, MEIS1, PU1, SCLCEBPA, CEBPB, P65, STAT1GATA1, MAFK, MXI1, NELFE, TBPGATA1GATA1, GATA2, RING1BMYBGATA3, STAT3, STAT5A, STAT5B, STAT5, STAT6, TBETRAG2, RING1B, RUNX1EMLErythroid progenitorsErythroidHPC7MacrophagesMELMK cellsMK progenitorsMyeloid progenitorsT cellsThymocytes
739Late Ery + T/B cell + GRANTCF7MXI1, NELFEFLI1RAG2EMLMELT cellsThymocytes
763Late MYEEBF1RUNX1, TCF7PU1FLI1, GFI1B, RUNX1, SCLCEBPA, CEBPB, P65, PPARG, STAT1NELFECBFBGATA3, STAT4, TBETRAG2B cellsEMLErythroidHPC7MacrophagesMELMK progenitorsT cellsThymocytes
793Late Ery + T/B – cell + GRANTCF7SCLNELFEETS1, RUNX1FLI1CBFB, RAG2EMLHPCMELMK progenitorsT cellsThymocytes
799NK + T cells (2)E2A, FOX01, OCT2, PAX5, PU1RUNX1ETO2, PU1GATA2, LDB1, SCLERG, FLI1, GATA2, GFI1B, LMO2, LYL1, MEIS1, PU1, RUNX1, SCLCEBPA, CEBPB, P65, PPARG, PU1, STAT1CMYB, CHD2, JUNDGATA1CBFB, GATA1, GATA2, RING1BMYBSMAD1FLI1, GATA3, PU1, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETCBFB, RAG2, RUNX1B cellsEMLErythroidHPCHPC7MacrophagesMELMK cellsMK progenitorsMyeloid progenitorsPro B cellsT cellsThymocytes
811Early Mye + T/B cell + GRANTCF7SCLCMYC, MXI1, NELFE, TBP, USF2ETS1, RUNX1FLI1RAG2, RUNX1EMLHPC7MELMK progenitorsT cellsThymocytes
817T/B cellE2A, EBF1, OCT2, PAX5RUNX1, TCF7ETO2, GFI1B, PU1, SCLGATA2, SCLERG, FLI1, GFI1B, LMO2, MEIS1, PU1, RUNX1, SCLCEBPA, CEBPB, P65, PPARG, STAT1CMYB, CHD2, MXI1, NELFE, SCLCBFB, GATA2, RING1B, RUNX1EBF1, SMAD1FLI1, GATA3, STAT3, STAT4, STAT5A, STAT5BCBFB, RAG2, RING1B, RUNX1B cellsEMLErythroidHPCHPC7MacrophagesMELMK progenitorsPro B cellsT cellsThymocytes
823Early Mye + T/B cell + GRANMXI1, NELFEFLI1RAG2MELT cellsThymocytes
835Early Mye + T/B – cell + GRANGFI1BERGCMYC, CHD2, MAX, NELFE, TBPETS1, RUNX1FLI1, STAT3, STAT6RAG2ErythroidHPC7MELMK progenitorsT cellsThymocytes
841Early Mye + T/B cell + GRANTCF7SCLNOTCH1CMYC, MAX, MXI1, NELFE, TBPETS1FLI1RAG2EMLHPC7LeukemiaMELMK progenitorsT cellsThymocytes
859T cell + NKRING1BThymocytes
871HSC + Early MYEMXI1, NELFEFLI1RAG2MELT cellsThymocytes
883Late Ery + T/B cell + GRANPU1TCF7GATA1GATA1, GFI1BERG, SCLCEBPBCMYC, CHD2, MXI1, NELFE, TBPCBFB, RING1B, RUNX1FLI1, STAT3, STAT4, STAT5, STAT6, TBETRAG2, RING1B, RUNX1B cellsEMLErythroid progenitorsErythroidHPC7MacrophagesMELMK progenitorsT cellsThymocytes
889Late EryGATA1, GATA2, SMAD1ETO2, GATA1, LDB1, MTGR1, SCLGATA2, LDB1, SCLLMO2, RUNX1CEBPA, CEBPBCMYB, GATA1, MAFK, MAX, SCL, USF2GATA1GATA1, GATA2, RING1B, RUNX1Erythroid progenitorsErythroidHPCHPC7MacrophagesMELMK cellsMK progenitors
901Early Mye + T/B cell + GRANTCF7GFI1BNOTCH1CMYC, MXI1, NELFE, TBPETS1FLI1, STAT5BRAG2, RUNX1EMLErythroidLeukemiaMELMK progenitorsT cellsThymocytes
907Late Ery + T/B cell + GRANTCF7GATA1, GATA2ETO2, GATA1, GFI1B, MTGR1LDB1, SCLNOTCH1CMYC, GATA1, MAX, MXI1, NELFE, SCL, TBPCBFB, ETS1, GATA1, RING1BFLI1, STAT3, STAT5B, STAT6, TBETRAG2EMLErythroid progenitorsErythroidHPCLeukemiaMELMK progenitorsT cellsThymocytes
925Early Mye + T/B cell + GRANTCF7CMYC, MXI1, NELFE, TBPFLI1, STAT6RAG2EMLMELT cellsThymocytes
943T/B cellTCF7NELFEETS1FLI1RAG2EMLMELMK progenitorsT cellsThymocytes
961B cellE2A, EBF1, OCT2B cells
973HSE + Early MyeNELFEMEL
979Late MYECEBPA, CEBPB, P65, PPARG, STAT1MYBMacrophagesMyeloid progenitors
985Early Mye + T/B – cell + GRANCHD2MEL
991T/B cellE2A, OCT2, PAX5, PU1RUNX1, TCF7GATA1, GATA2GATA1, GFI1B, PU1ERG, FLI1, GFI1B, MEIS1, PU1CEBPA, CEBPB, P65, PPARG, STAT1CMYC, CHD2, MAX, MXI1, NELFE, TBPCBFB, ETS1, RING1B, RUNX1MYBEBF1FLI1, PU1, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETB cellsEMLErythroid progenitorsErythroidHPC7MacrophagesMELMK progenitorsMyeloid progenitorsPro B cellsT cellsThymocytes
1003Late Ery + T/B – cell + GRANNELFERAG2MELThymocytes
1021Early Mye + T/B cell + GRANTCF7GFI1BNELFEETS1FLI1RAG2, RING1B, RUNX1EMLErythroidMELMK progenitorsT cellsThymocytes
Supplementary Table 3

Sixty-five gene sets of 80 with respective induction patterns from Novershtern et al. [14] enriched for transcription factor binding regions across multiple blood tissues using GSCA: 63 of 65 show cell type and induction pattern matching

Novershtern et al. clusters
Combinatorial control signature
No.Induction patternTranscription factorCell type
399NoneSTAT4, STAT5T cells
559NK + T cell (2)STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETT cells
571Late MYECEBPA, CEBPB, P65, PU1, STAT1Macrophages
583Late ERY + T/B cell + GranGATA1, GATA2, SMAD1SCLSCLMAX, MXI1, TBPRAG2Erythroid progenitorsErythroidHPCMELThymocytes
607Early MYE + T/B cell + GranPU1ERG, FLI1, GFI1B, MEIS1, PU1, SCLCEBPA, CEBPB, P65, PPARG, PU1, STAT1MYBGATA3, PU1, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETB cellsHPC7MacrophagesMyeloid progenitorsT cells
613T/B – cellPU1B cells
619Late MYECEBPA, CEBPB, PU1, STAT1Macrophages
637Late EryGATA1, GATA2, SMAD1GATA1, LDB1, MTGR1, SCLERG, LDB1ERGGATA1, SCLGATA1, RING1B, RUNX1Erythroid progenitorsErythroidHPCHPC7MELMK progenitors
643HSE + Early MyeGATA2HPC7
649B cellsE2A, PAX5, PU1CEBPA, CEBPB, P65, PU1, STAT1B cellsMacrophages
655MyeGATA1, GATA2, SMAD1GATA1, LDB1, MTGR1, SCLLDB1, SCLCEBPB, P65, PU1, STAT1CMYB, CMYC, GATA1, MAFK, MXI1, SCL, TBPCBFB, GATA1, RING1BErythroid progenitorsErythroidHPCMacrophagesMELMK progenitors
661Late Ery + T/B cell + GRANPU1GATA1, GATA2CMYB, CHD2, GATA1, MXI1, NELFE, TBPETS1FLI1RAG2B cellsErythroid progenitorsMELMK progenitorsT cellsThymocytes
667T cell + NKGATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETCBFB, RAG2, RING1B, RUNX1T cellsThymocytes
673T/B cellE2A, OCT2, PU1CEBPB, P65, PU1, STAT1GATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETB cellsMacrophagesT cells
679HSE + Early MyeGATA2Erythroid progenitors
685Late MYE + T/B cell + GRANRUNX1, TCF7GFI1B, PU1ERG, PU1, SCLCEBPA, CEBPB, P65, PU1, STAT1CMYC, GATA1, MXI1, NELFE, TBPCBFB, ETS1FLI1, PU1RAG2, RUNX1EMLErythroidHPC7MacrophagesMELMK progenitorsT cellsThymocytes
703T/B cellTCF7GFI1BERG, PU1NOTCH1CMYC, MAX, MXI1, NELFE, TBPCBFB, ETS1FLI1, STAT3, STAT4, STAT5B, STAT6, TBETCBFB, RAG2EMLErythroidHPC7LeukemiaMELMK progenitorsT cellsThymocytes
709General mild inductionETO2AF9ErythroidMLL leukemia
715Early MYE + T/B cell + GRANPU1TCF7GFI1BCEBPA, CEBPB, PU1, STAT1CMYC, GATA1, MXI1, NELFE, TBPETS1FLI1RAG2B cellsEMLErythroidMacrophagesMELMK progenitorsT cellsThymocytes
721Late MYE + DCsCEBPA, CEBPB, P65, PU1, STAT1MYBMacrophagesMyeloid progenitors
727Late EryGATA1, GATA2, SMAD1ETO2, GATA1, GFI1B, LDB1, MTGR1, SCLLDB1, SCLCMYC, GATA1, MAFK, MAX, MXI1, SCL, TBPCBFB, GATA1, GATA2, RING1B, RUNX1Erythroid progenitorsErythroidHPCMELMK progenitors
733HSC + Early MYECEBPA, CEBPB, P65, PU1, STAT1MYBMacrophagesMyeloid progenitors
739Late ERY + T/B cell + GranPU1TCF7GATA1, GATA2GFI1B, PU1ERG, PU1NOTCH1CMYC, CHD2, GATA1, MAX, MXI1, NELFE, TBPCBFB, ETS1, RUNX1FLI1RAG2, RUNX1B cellsEMLErythroid progenitorsErythroidHPC7LeukemiaMELMK progenitorsT cellsThymocytes
745General mild inductionMYBSTAT3, STAT4, STAT5Myeloid progenitorsT cells
757T cell + NKTCF7CMYC, MAX, MXI1, NELFE, TBPFLI1, STAT3, STAT5, TBETRAG2EMLMELT cellsThymocytes
763Late MYEERG, FLI1CEBPA, CEBPB, P65, PPARG, PU1, STAT1PU1, STAT3, STAT4, STAT5, TBETHPC7MacrophagesT cells
769T/B cellPU1CEBPA, CEBPBGATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETB cellsMacrophagesT cells
775MyeGATA1MK cells
781General mild inductionNOTCH1TALL
787MYESMAD1CEBPA, CEBPB, STAT1GATA1, MAFKErythroid progenitorsMacrophagesMEL
793Late ERY + T/B cell + GranPAX5, PU1TCF7GATA1, GATA2, SMAD1GATA1, PU1, SCLPU1, SCLPU1CEBPA, CEBPB, PU1, STAT1CMYB, CMYC, GATA1, MAX, MXI1, NELFE, SCL, TBPGATA1CBFB, ETS1, RING1B, RUNX1FLI1, GATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETCBFB, RAG2, RUNX1B cellsEMLErythroid progenitorsErythroidHPCHPC7MacrophagesMELMK cellsMK progenitorsT cellsThymocytes
799NK + T cell (2)E2ACEBPA, CEBPB, P65, PU1, STAT1GATA3, PU1, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETB cellsMacrophagesT cells
805HSE + Early MyeETO2Erythroid
811Late MYE + T/B cell + GRANTCF7GFI1BERG, SCLCEBPA, CEBPB, PU1CMYC, CHD2, MAX, MXI1, NELFE, TBP, USF2CBFB, ETS1, RUNX1FLI1, STAT4, STAT5B, STAT5, STAT6, TBETCBFB, RAG2, RING1B, RUNX1EMLErythroidHPC7MacrophagesMELMK progenitorsT cellsThymocytes
817T/B cellE2A, EBF1, PAX5, PU1ERG, MEIS1, PU1CEBPA, CEBPB, P65, PPARG, PU1, STAT1RING1BEBF1, SMAD1FLI1, PU1, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETCBFB, RAG2, RUNX1B cellsHPC7MacrophagesMK progenitorsPro B cellsT cellsThymocytes
823Early MYE + T/B cell + GRANTCF7GATA2GFI1BSCLNOTCH1CMYC, CHD2, MAX, MXI1, NELFE, TBPCBFB, ETS1RBPJFLI1, STAT3, STAT6RAG2EMLErythroid progenitorsErythroidHPC7LeukemiaMELMK progenitorsT cell leukemiaT cellsThymocytes
829T cell + NKE2AGATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETCBFB, RUNX1B cellsT cellsThymocytes
835Early MYE + T/B cell + GRANPAX5TCF7GFI1BERGNOTCH1CMYC, CHD2, MAX, MXI1, NELFE, TBPCBFB, ETS1, RUNX1RBPJFLI1RAG2B cellsEMLErythroidHPC7LeukemiaMELMK progenitorsT cell leukemiaT cellsThymocytes
841Early MYE + T/B cell + GRANPU1TCF7GFI1B, PU1ERG, GFI1B, PU1, SCLNOTCH1PU1CMYC, CHD2, MAX, MXI1, NELFE, TBPETS1, RUNX1AF9RBPJFLI1, PU1, STAT3, STAT4, STAT5B, STAT6, TBETRAG2B cellsEMLErythroidHPC7LeukemiaMacrophagesMELMK progenitorsMLL leukemiaT cell leukemiaT cellsThymocytes
847Late Ery + T/B cell + GRANPAX5TCF7GATA1, GFI1BGFI1BNOTCH1CMYC, CHD2, MAX, MXI1, NELFE, TBPCBFB, ETS1, RUNX1RBPJFLI1, STAT3, STAT4, STAT5B, STAT6, TBETRAG2B cellsEMLErythroidHPC7LeukemiaMELMK progenitorsT cell leukemiaT cellsThymocytes
853Late MYEPU1ERG, PU1CEBPA, CEBPB, P65, PU1, STAT1PU1, STAT3, STAT4, STAT5A, STAT5, STAT6B cellsHPC7MacrophagesT cells
859T cell + NKGATA3, STAT3, STAT5, TBETT cells
865HSE + early MyeGATA2, SMAD1GATA1Erythroid progenitorsMEL
871HSE + Early MYETCF7GFI1BERG, MEIS1, PU1NOTCH1CEBPA, CEBPB, PU1, STAT1CMYC, MAX, MXI1, NELFE, TBPCBFBFLI1, STAT3, STAT6, TBETRAG2EMLErythroidHPC7LeukemiaMacrophagesMELMK progenitorsT cellsThymocytes
883Late MYE + T/B cell + GranTCF7GATA1, GFI1BSCL, SCLSCLCMYC, CHD2, MXI1, NELFE, TBPRING1B, RUNX1FLI1, STAT4, STAT6, TBETRAG2, RUNX1EMLErythroidHPCHPC7MELMK progenitorsT cellsThymocytes
889Late ERYGATA1, GATA2, SMAD1ETO2, GATA1, LDB1, MTGR1, SCLLDB1, SCLGATA1, SCLGATA1, GATA2, RING1BSTAT4Erythroid progenitorsErythroidHPCMELMK progenitorsT cells
895Late ERYGATA2GATA1, LDB1CEBPA, CEBPB, PU1, STAT1MXI1Erythroid progenitorsErythroidMacrophagesMEL
901Late MYE + T/B cell + GranTCF7GATA1, GATA2, SMAD1GFI1B, PU1GFI1BNOTCH1CMYC, GATA1, MAX, MXI1, NELFE, TBPCBFB, ETS1FLI1, STAT3, STAT5B, STAT5CBFB, RAG2, RING1B, RUNX1EMLErythroid progenitorsErythroidHPC7LeukemiaMELMK progenitorsT cellsThymocytes
907Late ERY + T/B cell + GranPAX5, PU1TCF7GATA1, GATA2, SMAD1ETO2, GATA1, GFI1B, LDB1, MTGR1, SCLERG, LDB1, SCLERGNOTCH1CMYB, CMYC, CHD2, GATA1, MAX, MXI1, NELFE, SCL, TBPCBFB, ETS1, GATA1, RING1B, RUNX1FLI1RAG2B cellsEMLErythroid progenitorsErythroidHPCHPC7LeukemiaMELMK progenitorsT cellsThymocytes
919HSE + Early MyeFOX01ERGCMYC, MXI1, NELFE, TBPB cellsHPC7MEL
925Early MYE + T/B cell + GRANPAX5, PU1TCF7GATA1, GATA2, SMAD1GATA1, GFI1B, PU1ERG, SCL, SCLERG, SCLCMYC, CHD2, GATA1, MAX, MXI1, NELFE, SCL, TBPCBFB, ETS1, RING1B, RUNX1FLI1, STAT3, STAT4, STAT5B, STAT5, STAT6, TBETRAG2, RING1BB cellsEMLErythroid progenitorsErythroidHPCHPC7MELMK progenitorsT cellsThymocytes
931NoneGATA1STAT4, STAT5B, STAT5MELT cells
943T/B cellTCF7GATA2PU1CMYC, MXI1, NELFE, TBPETS1FLI1, GATA3, STAT3, STAT4, STAT5B, STAT5, STAT6, TBETCBFB, RAG2, RUNX1EMLErythroid progenitorsErythroidMELMK progenitorsT cellsThymocytes
949T/B cellGATA2RAG2Erythroid progenitorsThymocytes
955T cell + NKGATA3, STAT3, STAT4, STAT5A, STAT5B, STAT5, STAT6, TBETT cells
961B cellE2A, EBF1, OCT2, PAX5, PU1CEBPA, CEBPB, P65, PU1, STAT1B cellsMacrophages
967Late ERY + T/B cell + GranPAX5TCF7GATA1, GFI1B, LDB1ERG, FLI1, MEIS1, PU1, SCLNOTCH1CMYC, CHD2, MAX, MXI1, NELFE, SMC3, TBPCBFB, ETS1, RING1B, RUNX1FLI1, STAT4, STAT6, TBETRAG2B cellsEMLErythroidHPC7LeukemiaMELMK progenitorsT cellsThymocytes
973HSE + Early MyeCMYC, GATA1, MAX, MXI1MEL
979Late MYEPU1GFI1BCEBPA, CEBPB, P65, PPARG, PU1, STAT1EBF1STAT3, STAT4, STAT5, STAT6B cellsHPC7MacrophagesPro B cellsT cells
985Early MYE + T/B cell + GranTCF7GFI1BERG, SCLCMYC, CHD2, MAX, MXI1, NELFE, TBPETS1, RUNX1FLI1, STAT3, STAT4, STAT6, TBETCBFB, RAG2, RUNX1EMLErythroidHPC7MELMK progenitorsT cellsThymocytes
991T/B cellPU1GATA2PU1ERGCEBPA, CEBPB, PU1, STAT1MXI1, TBPFLI1, STAT3, STAT4, STAT5B, STAT5, STAT6, TBETRAG2B cellsErythroid progenitorsErythroidHPC7MacrophagesMELT cellsThymocytes
997NK + T cell (2)STAT4T cells
1003Late ERY + T/B cell + GranTCF7GATA1, GFI1B, PU1, SCLERG, PU1, SCLERG, PU1NOTCH1CMYC, CHD2, MAX, MXI1, NELFE, TBPCBFB, ETS1, RING1B, RUNX1RBPJFLI1, GATA3, STAT3, STAT6CBFB, RAG2, RING1B, RUNX1EMLErythroidHPCHPC7LeukemiaMELMK progenitorsT cell leukemiaT cellsThymocytes
1009HSE + Early MyeCEBPA, CEBPB, P65, PU1, STAT1Macrophages
1021Early MYE + T/B cell + GRANPAX5TCF7GATA1GFI1B, PU1SCLNOTCH1CMYC, CHD2, MXI1, NELFE, TBPETS1, RUNX1FLI1RAG2, RING1B, RUNX1B cellsEMLErythroid progenitorsErythroidHPC7LeukemiaMELMK progenitorsT cellsThymocytes
  48 in total

1.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.

Authors:  Eran Segal; Michael Shapira; Aviv Regev; Dana Pe'er; David Botstein; Daphne Koller; Nir Friedman
Journal:  Nat Genet       Date:  2003-06       Impact factor: 38.330

2.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nat Protoc       Date:  2009       Impact factor: 13.491

Review 3.  Evolution of transcriptional control in mammals.

Authors:  Michael D Wilson; Duncan T Odom
Journal:  Curr Opin Genet Dev       Date:  2009-11-11       Impact factor: 5.578

4.  Genome-wide analysis reveals conserved and divergent features of Notch1/RBPJ binding in human and murine T-lymphoblastic leukemia cells.

Authors:  Hongfang Wang; James Zou; Bo Zhao; Eric Johannsen; Todd Ashworth; Hoifung Wong; Warren S Pear; Jonathan Schug; Stephen C Blacklow; Kelly L Arnett; Bradley E Bernstein; Elliott Kieff; Jon C Aster
Journal:  Proc Natl Acad Sci U S A       Date:  2011-07-07       Impact factor: 11.205

5.  Predicting tissue-specific enhancers in the human genome.

Authors:  Len A Pennacchio; Gabriela G Loots; Marcelo A Nobrega; Ivan Ovcharenko
Journal:  Genome Res       Date:  2007-01-08       Impact factor: 9.043

6.  Haploinsufficiency of Dnmt1 impairs leukemia stem cell function through derepression of bivalent chromatin domains.

Authors:  Jennifer J Trowbridge; Amit U Sinha; Nan Zhu; Mingjie Li; Scott A Armstrong; Stuart H Orkin
Journal:  Genes Dev       Date:  2012-02-15       Impact factor: 11.361

7.  Densely interconnected transcriptional circuits control cell states in human hematopoiesis.

Authors:  Noa Novershtern; Aravind Subramanian; Lee N Lawton; Raymond H Mak; W Nicholas Haining; Marie E McConkey; Naomi Habib; Nir Yosef; Cindy Y Chang; Tal Shay; Garrett M Frampton; Adam C B Drake; Ilya Leskov; Bjorn Nilsson; Fred Preffer; David Dombkowski; John W Evans; Ted Liefeld; John S Smutko; Jianzhu Chen; Nir Friedman; Richard A Young; Todd R Golub; Aviv Regev; Benjamin L Ebert
Journal:  Cell       Date:  2011-01-21       Impact factor: 41.582

Review 8.  Deciphering transcriptional control mechanisms in hematopoiesis:the impact of high-throughput sequencing technologies.

Authors:  Nicola K Wilson; Marloes R Tijssen; Berthold Göttgens
Journal:  Exp Hematol       Date:  2011-07-23       Impact factor: 3.084

9.  Genome-wide lineage-specific transcriptional networks underscore Ikaros-dependent lymphoid priming in hematopoietic stem cells.

Authors:  Samuel Yao-Ming Ng; Toshimi Yoshida; Jiangwen Zhang; Katia Georgopoulos
Journal:  Immunity       Date:  2009-04-02       Impact factor: 31.745

10.  Hematopoiesis: an evolving paradigm for stem cell biology.

Authors:  Stuart H Orkin; Leonard I Zon
Journal:  Cell       Date:  2008-02-22       Impact factor: 41.582

View more
  9 in total

1.  ChIP-Enrich: gene set enrichment testing for ChIP-seq data.

Authors:  Ryan P Welch; Chee Lee; Paul M Imbriano; Snehal Patil; Terry E Weymouth; R Alex Smith; Laura J Scott; Maureen A Sartor
Journal:  Nucleic Acids Res       Date:  2014-05-30       Impact factor: 16.971

2.  Building an ENCODE-style data compendium on a shoestring.

Authors:  David Ruau; Felicia S L Ng; Nicola K Wilson; Rebecca Hannah; Evangelia Diamanti; Patrick Lombard; Steven Woodhouse; Berthold Göttgens
Journal:  Nat Methods       Date:  2013-10       Impact factor: 28.547

3.  Mammalian transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes and are predicted to act as transcriptional activator hubs.

Authors:  Anagha Joshi
Journal:  BMC Bioinformatics       Date:  2014-12-30       Impact factor: 3.169

4.  CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities.

Authors:  Manuel Sánchez-Castillo; David Ruau; Adam C Wilkinson; Felicia S L Ng; Rebecca Hannah; Evangelia Diamanti; Patrick Lombard; Nicola K Wilson; Berthold Gottgens
Journal:  Nucleic Acids Res       Date:  2014-09-30       Impact factor: 19.160

5.  Concerted bioinformatic analysis of the genome-scale blood transcription factor compendium reveals new control mechanisms.

Authors:  Anagha Joshi; Berthold Gottgens
Journal:  Mol Biosyst       Date:  2014-11

6.  Dynamic Gene Regulatory Networks Drive Hematopoietic Specification and Differentiation.

Authors:  Debbie K Goode; Nadine Obier; M S Vijayabaskar; Michael Lie-A-Ling; Andrew J Lilly; Rebecca Hannah; Monika Lichtinger; Kiran Batta; Magdalena Florkowska; Rahima Patel; Mairi Challinor; Kirstie Wallace; Jane Gilmour; Salam A Assi; Pierre Cauchy; Maarten Hoogenkamp; David R Westhead; Georges Lacaud; Valerie Kouskoff; Berthold Göttgens; Constanze Bonifer
Journal:  Dev Cell       Date:  2016-02-25       Impact factor: 12.270

7.  A regulatory circuit comprising GATA1/2 switch and microRNA-27a/24 promotes erythropoiesis.

Authors:  Fang Wang; Yong Zhu; Lihua Guo; Lei Dong; Huiwen Liu; Haixin Yin; Zhongzu Zhang; Yuxia Li; Changzheng Liu; Yanni Ma; Wei Song; Aibin He; Qiang Wang; Linfang Wang; Junwu Zhang; Jianxiong Li; Jia Yu
Journal:  Nucleic Acids Res       Date:  2013-09-18       Impact factor: 16.971

8.  Growth factor independence 1b (gfi1b) is important for the maturation of erythroid cells and the regulation of embryonic globin expression.

Authors:  Lothar Vassen; Hugues Beauchemin; Wafaa Lemsaddek; Joseph Krongold; Marie Trudel; Tarik Möröy
Journal:  PLoS One       Date:  2014-05-06       Impact factor: 3.240

9.  Resolving early mesoderm diversification through single-cell expression profiling.

Authors:  Antonio Scialdone; Yosuke Tanaka; Wajid Jawaid; Victoria Moignard; Nicola K Wilson; Iain C Macaulay; John C Marioni; Berthold Göttgens
Journal:  Nature       Date:  2016-07-06       Impact factor: 49.962

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.