Literature DB >> 23603899

Gene-pair expression signatures reveal lineage control.

Merja Heinäniemi1, Matti Nykter, Roger Kramer, Anke Wienecke-Baldacchino, Lasse Sinkkonen, Joseph Xu Zhou, Richard Kreisberg, Stuart A Kauffman, Sui Huang, Ilya Shmulevich.   

Abstract

The distinct cell types of multicellular organisms arise owing to constraints imposed by gene regulatory networks on the collective change of gene expression across the genome, creating self-stabilizing expression states, or attractors. We curated human expression data comprising 166 cell types and 2,602 transcription-regulating genes and developed a data-driven method for identifying putative determinants of cell fate built around the concept of expression reversal of gene pairs, such as those participating in toggle-switch circuits. This approach allows us to organize the cell types into their ontogenic lineage relationships. Our method identifies genes in regulatory circuits that control neuronal fate, pluripotency and blood cell differentiation, and it may be useful for prioritizing candidate factors for direct conversion of cell fate.

Entities:  

Mesh:

Year:  2013        PMID: 23603899      PMCID: PMC4131748          DOI: 10.1038/nmeth.2445

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


INTRODUCTION

Mammalian organisms contain at least 250 cell types[1], each specified by a characteristic gene expression profile. Despite increasing availability of expression data, comprehensive characterization of cell type–specific expression profiles remains challenging due to inconsistencies in annotations and technical issues such as data normalization. Moreover, common differential expression analyses alone are insufficient to recover ontogenetic cell lineage relationships or to reflect regulatory relationships among transcription factors (TFs) that lead some to function as fate determinants. We describe a data-driven method that addresses these problems in the context of the very mechanisms by which the gene regulatory networks govern lineage development. Our analysis is motivated by a two-gene circuit motif known to control binary developmental decisions[2]. This motif, first hypothesized to control developmental switches in Drosophila[3,4], contains a pair of mutually-repressive TFs and effectively constitutes a toggle switch. These circuits allow a bipotent progenitor cell to simultaneously co-express two opposing TFs at low levels, the poised state {TF1 ≈ TF2},[2] but force it to choose between either of two stable configurations in which one TF dominates the other, {TF1 ≫ TF2} or {TF2 ≫ TF1}. Such pairs of antagonistic TFs can govern the development of “sister” lineages. In addition to cross-inhibiting each other, these TFs also act as lineage-specifying master regulators of target genes that are reciprocally expressed in the two sister lineages, thus establishing lineage-specific gene expression profiles[2]. The pair {SPI1, GATA1} is a well-studied example in the hematopoietic system[5]. SPI1 (PU.1) specifies the myeloid lineage characterized by SPI1GATA1 whereas GATA1 specifies the erythroid lineage in which GATA1SPI1[6]. The lineage split manifests as the establishment of a mutual exclusion, resulting in reversed expression between the two TFs, which can be exploited to identify master regulators. We score genes for potential participation in such expression reversals. We expect gene pairs that function as lineage determinants to exhibit consistent relative expression across samples from the same cell type (and lineage) and consistent reversal of relative expression between cell types from sister lineages, a property that has been exploited in expression-based classifiers[7-9]. By applying this method to curated gene expression data from 166 cell types and 2,602 transcription regulating genes, we show that experimentally verified master regulators of cell type fate are indeed revealed through quantification of their participation in expression reversals. Focusing on hematopoiesis, our method reveals known and novel candidate fate-specifying genes that exhibit the signature of participation in antagonistic circuits, results which were confirmed by genome-wide ChIP-seq data. Finally, we derived a cell type similarity measure from expression reversals with which we could recover known ontogenetic lineage-relationships reminiscent of the branching valleys of the epigenetic landscape envisioned by Waddington[10].

RESULTS

Gene expression reversal analysis

We curated a dataset comprising 2,919 microarrays and representing 166 normal human cell types (described in Supplementary Results, Supplementary Tables 1–3 and Supplementary Fig. 1) and selected genes with functional annotation related to transcription regulation (Supplementary Results, Supplementary Tables 4 and 5 and Supplementary Fig. 2). A subset formed from strictly-defined TFs will be referred to as the TF set (844 genes). The term TF will be used to refer to all transcription regulating genes for simplicity. For every pair of genes and every pair of cell types, we define the reversal score Δ to be the difference between cell types of the mean rank difference (within each cell type) between genes (Eq. 1–3 in Methods, Fig. 1). Use of rank data rather than absolute expression obviates the need for sample normalization, typically needed due to sample distribution differences (Supplementary Fig. 3), because all direct comparisons between genes happen within samples, and conventional normalization methods are rank-preserving(Supplementary Results). Thus, large absolute values of Δ identify gene pairs that reverse expression between cell types. Δ is clamped to 0 for pairs of genes that do not change relative expression (the difference in their mean ranks does not change sign) between cell types. Fixing the gene pair in Δ and letting the cell types vary produces gene pair reversal plots which visualize the potential for a gene pair to participate in a lineage split between any pair of cell types (Fig. 1b). Finally, we define the participation score Ψ for a fixed gene (Eqs. 4,5 in Methods) to be an aggregate measure of the number and strength of reversals in which the gene participates (Fig. 1c).
Fig. 1

Gene pair expression reversal analysis exemplified by schematic data

A schematic example to illustrate the principle of the expression reversal method is shown. (a) The ranks of two hypothetical genes g and g′ are plotted from microarray samples assigned to three hypothetical cell types. (b) Gene pair reversal plot. The reversal behavior of the {gene g, gene g′} gene pair quantified for all pair-wise comparisons of N = 3 cell types is shown as an N x N symmetric matrix. The value, indicating the extent of reversal behavior is represented by the color in the heat map. Red tones indicate that the pair configuration changes from gene g ≫ gene g′ in the first cell type of a comparison pair (“row-to-column comparison”) to gene g ≪ gene g′ in the second cell type. A reversal in the opposite direction in cell type comparisons are indicated in blue shades. (c) Reversal participation. The Ψ value for gene g quantifies its reversal participation from all gene pairs displayed across each pair-wise comparisons of (here N = 32) cell types. A specific gene pair configuration in multiple gene pairs involving g, will be reflected by a high score (dark red or blue). Alternatively, the gene reversal participation can be assessed at the cell type level by extracting from the gene portraits the cell type (row) of interest, and subsequently sorting by maximal Ψ value.

Revealing critical factors for induced pluripotency

We hypothesized that participation of a gene in reversals involving a given cell type is indicative of the specificity of the gene for that cell type as well as its potential to participate in lineage determination. We sorted genes by their participation scores in comparisons of embryonic stem cells (ESC) with other cell types (Fig. 2a). Interestingly, the genes NANOG, POU5F1 (OCT3 or 4), SOX2 and LIN28 that appear on this top list are precisely those that jointly are capable of inducing the pluripotent state from differentiated cells[11] (see also Supplementary Fig. 4). A critical role in regulation of stem cell transcription has been reported for 17 of the top 20 genes (Supplementary Table 6). These results are very robust to noise and sample size differences (Supplementary Figs 5–7 and Supplementary Results).
Fig. 2

Cell type-level analysis of reversal participation in the ESC highlights genes used to induce pluripotency

Reversal participation analysis for ESCs compared to all other cell types reveals genes that are important in determining ESC (refer to Supplementary Table 3 for the order of cell types in columns). (a) The first 100 rows (of 2,602 TFs evaluated) of the ESC cell portrait are displayed and the names of top 20 most specific ESC-high transcription regulating genes are indicated, including those used to induce pluripotency in human cells[11]: LIN28, NANOG, POU5F1 and SOX2. (b) Active ESC transcription and promoter state was evaluated from ENCODE[12] RNA-seq (R) and ChIP-seq (C) of histone methylation datasets. The level of the H3K4me3 marker for active promoters around 5 kb up- or downstream from the gene transcription start site (TSS) is shown from six normal ENCODE cell types H1 ES: human ESC line H1, HMEC: breast epithelial cell, HSMM: skeletal muscle myoblast, HUVEC: umbilical vein endothelial cell, NHEK: epithelial keratinocyte, NHLF: lung fibroblast. RNA-seq data is available from H1 ES, HUVEC and NHEK cells. The high ESC expression and its specificity can be compared against the gene reversal portraits shown adjacent to the ChIP tracks.

We validated the cell type-restricted reversal patterns of the top 20 gene portraits using sequencing data[12] for chromatin markers (ChIP-seq) and for RNA (RNA-seq) from normal human cell types (including H1 ESCs in yellow) (Fig. 2b). Genes with a highly ESC-restricted gene portrait appear ESC-specific in both ChIP-seq and RNA-seq results. Furthermore, TF ChIP-seq data also suggest that the pluripotency inducing TFs NANOG, OCT4 and SOX2 co-occupy regulatory regions of genes that, with respect to our reversal participation score Ψ, are among the top 20 genes associated with ESCs[13] (Supplementary Fig. 8). Therefore, our analysis highlights genes that are not only maximally restricted to the respective cell type but may also operate in a lineage-determining switch.

Reversals expose genes with lineage-determining potential

Our data shows that reversal participation captures cell type–restricted expression. We chose the ESC for the analysis since the discovery of induced pluripotency factors paved the way toward exploiting cell type plasticity to actuate direct lineage-conversions. The ability of our analysis to highlight the core ESC network suggested that such reversals may identify TFs with lineage-specifying power which could be used to induce differentiation towards a particular cell type. We investigated this possibility in a published reprogramming experiment[14]. ASCL1 is a critical TF that alone and in combination with other factors was discovered to induce fibroblast to neuron conversion[14]. We sorted the reversal participation (Ψ) portraits of 19 candidate genes initially evaluated in the published reprogramming experiment by their potency[14] in enhancing ASCL1-induced neuronal differentiation (as reflected by strong color bands localized to few cell type pairs) (Fig. 3). The diffuse patterns in the plots of the two bottom rows are in agreement with experimental results[14] in which these genes showed no effect. Therefore, gene reversal participation also identifies potential fate-determining roles of a TF in a given lineage.
Fig. 3

Reversal participation analysis of a candidate gene set for the induction of neuronal differentiation reflects success in a functional assay

A set of 19 candidate transcription regulating genes was characterized experimentally for their neuronal differentiation induction potential[14]. The reversal participation gene portraits of these genes are shown. The ordering of the portraits reflects the experimental success to induce neuronal fate in combination with ASCL1 that was found[14] most potent on its own to induce the conversion of fibroblasts to neuronal cells. The grey bar indicates the location (rows) of neuronal cells in the figures.

Expression reversals in the hematopoietic lineage splits

To demonstrate how gene pair reversal analysis (Fig. 1b) can shed light on toggle switch circuits, we selected three characterized mutual repression circuits involved in blood cell lineage control: {GATA1, SPI1}, {GATA1, GATA2} and {GFI1, EGR2}. These pairs govern the lineage splits between erythroid vs. myeloid, erythroid vs. megakaryocyte and granulocyte vs. macrophage, respectively[5,15,16]. The first lineage split occurs via the mutual repression of the {GATA1, SPI1} TF pair[5]. Here the {SPI1GATA1} configuration is observed in the progenitor cells, consistent with the characteristic promiscuous expression pattern of multipotent cells[17], whereas a pronounced reversal of their relative expression levels occurs between the pro-erythroid and pro-myeloid cells: GATA1SPI1 in all pro-erythroid arrays and GATA1SPI1 in all pro-myeloid arrays (Supplementary Fig. 9a). Thus, the behavior of this gene pair across all cell types in the comparison set highlights the erythroid-myeloid lineage split as a distinct pattern (Supplementary Fig. 9b). Similarly, the {GATA1, GATA2} TF pair is reversed between pro-erythroid cells and platelets that segregate in a downstream lineage split[15] (Supplementary Fig. 9c). Finally, the {GFI1, EGR2} pair is strongly reversed between the granulocyte-lineage progenitors and the differentiated macrophages. Interestingly, this pair exhibits a signal in the lymphoid lineage, suggesting a broader role in the blood system, i.e. the reuse of circuits for different decisions[2] (Supplementary Fig. 9d). Lineage branching is often controlled not just by one toggle switch circuit but rather the integrated action of many interconnected[18] mutually repressing gene pairs. We demonstrate that using reversal scores and a priori knowledge of the lineage branching, we can identify TF pairs that exhibit an expression reversal associated specifically with the erythroid-myeloid lineage split or the B- vs T- lymphoid lineage split (Methods). We evaluated the reversal behavior of all gene pairs in the TF set in the context of an extended set of hematopoietic cell types. To increase specificity, we required that the TF pairs separating erythroid and myeloid cells are disjoint with the pairs separating lymphoid cells. For comparison, we performed a similar analysis using two rank-based methods to detect candidate genes based on differential expression (Supplementary Results). We matched the expression reversal pattern expected in these lineage splits (Fig. 4a) against the gene pair data to extract specific pairs {TF1, TF2} that are maximally lineage-restricted for either the common erythroid-myeloid or lymphoid progenitors and exhibit minimal reversal outside these cell types. To distinguish from reversals obtained by chance in comparisons between irrelevant cell types, we ordered the results of our reversal analysis by the probability of obtaining reversals in the entire 166x166 cell type comparison matrix using the hypergeometric distribution. Five pairs {TFi, TFj} that fulfill the erythroid-myeloid reversal pattern (exhibiting at least one reversal with |Δ| > 1) were found (Fig. 4b), including {GATA1, SPI1}. The complete (166x166 cell types) gene pair reversal plots used for the statistical significance calculation are shown below the pattern matched (exact p-values are indicated below the plots). The lymphoid pattern was matched to three TF pairs (Fig. 4c), each containing GATA3. Interestingly, many of the TFs found, including the validated GATA1-PU1 toggle switch, are known to be part of the core network that controls erythropoiesis, myelopoiesis or lymphopoiesis[19-27] and have been shown in some cases to engage in mutual interaction[5,28-30]. For comparison, we also used standard rank-based differential expression to identify relevant genes (see Supplementary Results). In doing so, we also obtain several of the same genes but fail to capture the lineage differentiating property, as this is not attributable to single genes but pairs of genes (Supplementary Results, Supplementary Tables 7–9).
Fig. 4

Identification of reversal pairs in lineage splits of the blood system

The HSC is the common precursor of all blood cells. Lymphoid cells branch off separately to give rise to the B and T cell lineages, wheras the myelo-erythroid lineage gives rise to the later binary split between the erythroid and myeloid cells. Lineage-determining TF pairs of the binary splits are expected to follow the reversal pattern shown in the idealized gene pair reversal plots for the subset of relevant lineages used as a query criterion. An ideal pair will also show no reversals for other cell type pairs in the full 166x166 cell type comparison matrix (a). Pairs of TFs that satisfy such properties and show a statistically significant restricted reversal in the 166x166 cell type data are shown with their p-values (hypergeometric test) for the erythroid-myeloid (in (b)) and B-T lymphoid (in (c)) splits. The heat maps represent gene pair reversal plots as in Fig. 1b, color corresponds to the mean normalized rank difference.

A number of independent experiments support the involvement in lineage determination of several of the genes identified by expression reversal scoring. Gata3 binding was observed in mouse ChIP-seq data[31] near the TSS of Ebf1 but not Spib or Aff3. In support of an antagonistic pair interaction, Gata3 is among the Ebf1-repressed genes in a gain of function study[32]. In addition, human ChIP-seq data from the GM12878 lymphoblastoid cells[12] indicates EBF1 binding nearby GATA3 TSS. ChIP-seq data also confirmed the possibility of cross-inhibitory interactions at the DNA-level for all three putative toggle switch circuits from the erythroid-myeloid analysis (Supplementary Figs 10 and 11). Moreover, the observed binding of the regulatory factors to their own promoter indicated possible auto-regulation, proposed to be important for genes that participate in lineage-regulatory toggle circuits for stabilizing the poised progenitor state[2,6]. Here, we studied whether the binding of the TFs GATA1, TAL1, PU1, EBF1 and GATA3, that show evidence of cross-inhibitory interactions among the specific TF pair, maps on a genome-wide scale into the mutually exclusive phenotypes. Based on multiple independent ChIP-seq datasets (Supplementary Table 10) we performed genomic region enrichment analyses (Methods) to test whether their binding preferentially occurs in the vicinity of genes associated with the specific hematopoietic lineages. Indeed, we found that GATA1 and TAL1 binding is clearly associated with the erythrocyte phenotype and differentiation, SPI1 with the myeloid-macrophage, EBF1 with B cells and GATA3 with T cells (Supplementary Tables 11–15), matching the TF knockout phenotype (Supplementary Table 16). Furthermore, each member of the antagonistic pairs was associated with phenotype terms of the respective sister lineage. Such binding to the genes of the reciprocal fate is indicative of wide-spread repressive regulation, beyond the antagonistic pair.

The gene pair reversals reflect lineage relationships

Lineage relationships are often illustrated as a tree because of the developmental genealogy of cell types, although the detailed structure of the actual “tree of development” (“cell fate map”)[10] of all cell types in higher metazoa remains unknown. We hypothesized that the number of gene pairs with reversed expression between a pair of cell types is indicative of the relatedness of the cell types. Formalizing this, we define a similarity measure Φ(X,Y) between two cell types, X and Y, as the count of gene pairs for which |Δ| > 1. We selected well-studied sets of hematopoietic cells and the developmentally related endothelial cells to test whether the similarity measure Φ was able to capture the hierarchical lineage relationships, which are well studied in this system. Moreover, several precursor cells of these lineages were present in the transcriptome dataset, permitting the study of branch points. Although traditional hierarchical clustering methods generate dendrograms, they cannot reflect the biological lineage tree since all precursors (which exhibit promiscuous gene expression profiles) would necessarily be placed on terminal branches (leaves). To build this biological intuition into our analysis, we first performed a hierarchical clustering of differentiated cell types using Φ similarity, followed by a separate placement of precursor cell types onto the tree branch points, taking Φ into consideration (see Methods). The resulting dendrogram (Fig. 5a) reflects the well-known hierarchical lineage relationships among these cell types. To facilitate interpretation, the similarity Φ of each cell profile to that of the embryonic stem cell (ESC) is used to superimpose an elevation onto the dendrogram (Fig. 5a). Interestingly, this exposed a key feature of the cell fate map in that the HSC and other precursor cell types are more proximal to the ESC than terminally differentiated cells. The third dimension therefore captured properties of a true differentiation landscape reminiscent of Waddington’s metaphoric epigenetic landscape[10]. We obtained a very similar landscape for blood cell types using an independent dataset (see Supplementary Fig. 12 and Supplementary Table 17).
Fig. 5

Lineage relationships among hematopoietic and endothelial cell types revealed measuring similarity based on gene pair expression reversals

An evaluation of utility of the similarity Φ to reflect lineage separation is shown. (a) Hierarchical clustering of differentiated cell types with the new feature of placement of precursor cell types to three branch points using the Hungarian algorithm and mapping of the tree to a landscape is visualized. The circular dendrogram in the x-y plane arranges cells to branching lineages identified by different colors. To represent all cell types and their similarity Φ, multidimensional scaling is shown with (b) TFs or (c) metabolic genes[43]. The landscape elevation (z-dimension) represents the Φ similarity to the ESC giving rise to a potential-like landscape in which development follows the downhill gradient as in Waddington’s epigenetic landscape[10]. Blue color and high altitude on the landscape corresponds to large similarity to the pluripotent state.

To challenge this concept, we first extended the clustering to include all 166 cell types (Fig. 5b) and then compared to a result we obtained using metabolic genes[33] instead of TFs (Fig. 5c and Supplementary Fig. 13). Since the precursors of many cell types are not present in the dataset used, multidimensional scaling was used to visualize cell type dissimilarities on a plane. We used the similarity Φ from the ESC similarly to superimpose an elevation of the landscape. In the TF landscape, we found precursor cell types at elevated locations and a distinct peak for the pluripotent cells. In contrast, metabolic genes that are not expected to drive lineage-determination failed to discriminate the precursor cells that now resided in a large basin that connects cell types from multiple lineages and differentiation stages.

DISCUSSION

Here we show a unique way to analyze cell type gene expression profiles that is connected to the very principles by which gene circuits govern cell type diversification. Using the information in the reversal of gene expression levels between pairs of TFs in pairs of cell types, we generated “participation portraits” of cell types that identified TFs known to play a role in fate determination. Furthermore, our curated sets of TFs that operate at the core of cell fate switch circuits now pave the way towards investigating how TFs, chromatin modification and RNA processing act together in cell lineage control[34] and within regulatory networks. For instance, two genes, DNMT3B and TET1 that were highly ranked in ESCs by our analysis regulate DNA methylation: DNMT3B had been described as an epigenetic regulator of pluripotency genes[35-37]. Upon its discovery, TET1 lacked annotation of its cellular function[38]. Our analysis suggests a developmental function and links uncharacterized genes to specific cell types (a key role for TET1 in pluripotent cells was indeed subsequently found[39]). Knowing the mechanistic interactions of transcriptional regulatory networks in different cell types[40] will enable cell type specific modeling of genetic networks and understanding how mutually repressive pairs of TFs that act as bistable lineage determining toggle switches affect other TFs and ultimately the global state of the network. By exploiting the concept of bidirectional regulation epitomized by the toggle switch circuits that we show is manifested in expression reversal behavior, we ground our method on proposed mechanisms in developmental biology[2-4] to successfully identify highly lineage-specific profiles and TFs involved in core fate-determining circuits. Since the identified genes are not only reporters correlated with cell lineages, but possibly involved in regulatory circuits that carry out cell fate decisions, the interactive tool we provide to explore this dataset could also inform the choice of potential candidate genes used in cell fate reprogramming. We identify with high significance eight relevant gene pairs for the developmental circuitry of the common progenitors in the blood system that allowed us to explore further how inherent properties of antagonistic pairs may manifest in other types of large scale datasets. Their active participation in developmental regulatory networks was confirmed by the high degree of inter-connectivity via co-occupied genomic sites and overlap in target genes found in ChIP-seq datasets. Finally, we utilize the reversal analysis to design a new cell type similarity measure that integrates regulatory information, affording a first opportunity to capture the “epigenetic landscape” of the cell differentiation tree directly from expression profile data. In conclusion, we present a global analysis of published cell type transcriptomes using the reversal of expression levels as a key quantity that captures the underlying regulatory dynamics in static gene expression profiles.

METHODS

Dataset collection and preprocessing of expression values

We analyzed 2,919 microarrays comprising 166 different cell types (in some cases tissues) that represent each cell type in its normal state. The dataset was collected from the GEO microarray repository from the hgu133Plus2 array type with each cell type represented by at least two arrays. Further details on the selection of the samples can be found in the Supplementary Results. Gene expression for the transcription regulating gene set was summarized using the GC-RMA algorithm[41] (no quantile normalization) and custom probe mappings. In total, the 2,602 genes are included in the analysis of which 844 represent TFs with high confidence (TF set). Details on gene set curation and probe mapping can be found in the Supplementary Results.

Representing gene expression data as gene pair data

To derive a normalization-independent quantity, we first convert the gene expression values to ranks r within each sample. The quantity that represents the gene pair configuration on a cell type level, the normalized mean rank difference of two genes, δ, is calculated as the mean rank difference of the two genes from each sample that represents this cell type with the requirement that the relative ranking between the pair members must be consistent (always r > r′ or r < r′). Towards this end, let T be an ordered set of cell type labels, G be an ordered set of genes and n be the number of samples for t ∈ T (n ≥ 2 always). Let R = [r) ] be the matrix of normalized expression ranks for gene g∈ G, and sample i for cell type t. By averaging over all samples n for a given cell type t, we construct the matrix R = [r] of mean normalized expression ranks. Normalized here means that simple rank values (integers in 1,…,|G|) are scaled by |G|−1 so that r) ∈ [|G|−1,1]. Clearly r∈ [|G|−1,1] as well. In the sequel, we will use “ranks” with the understanding that we are speaking of normalized ranks. To detect a gene pair expression reversal, we are interested in how the two genes’ ranks differ between cell types. To this end, we define the mean normalized rank difference of two genes in a given cell type: Notice that δ(g, g′, t) is non-zero if and only if the genes’ ranks manifest the same strict inequality across all samples associated with cell type t. Clearly, δ(g, g′, t) ∈ (−1, 1). In the text we denote this by δ for short.

Comparison of gene pair data across cell types: gene pair reversal analysis

Because we are interested in reversals of the genes’ relationship between cell types we similarly define the difference of differences as: Clearly, Δ (g, g′, t, t′) ∈ (−2, 2), and non-zero only if calculated from non-zero values. Those pairs with Δ ≠ 0 are referred to as reversal pairs. In order to extract only results where both members of a gene pair change their mean rank between the cell types, |Δ| ≥ 1 must hold. In the text we use the notation Δ for Δ (g, g′, t, t′). A simple result to justify thresholds: |Δ| ≥ 1 is possible only when both genes’ mean ranks change between cell types. Assume without loss of generality the mean rank of g does not change between cell types, so r = r. Then, and −1 < r′′ −1 ≤ r′′ − r′ < r′′ ≤ 1 with each inequality by virtue of positivity of [r]. To identify candidate toggle pairs we consider the ternary states Δ < 0, Δ > 0 or Δ = 0 and compare the expected configuration for the lineage split to the observed one within a particular cell type set (with representative cell types of a lineage split). To account for the possibility of obtaining a match by chance, the list is sorted based on the hypergeometric probability of obtaining the given number of reversals across cell type comparisons that include all 166 cell types.

Reversal participation

We define the reversal participation score Ψ to quantify the strength of participation of gene g in (potentially bistable) expression reversals in all pairs of cell types, t and t′. That is, g is fixed for the entire plot displayed, and t and t′ correspond to cell types. This measure of strength is the product of: (the log of) the number of reversals above a given threshold in which the gene participates and the actual magnitude of the strongest (positive or negative) reversal in which it participates. First, we identify the gene ĝ with respect to which g exhibits the strongest reversal Δ for a given pair of cell types, t and t′ as: and then define the reversal participation score as: where H is the |Δ | value above which we deem a reversal to have occurred, and I is the indicator function. We use H = 1 in our analysis. As t and t′ range over all 166 cell types, this yields square, skew-symmetric plots. Note that genes ubiquitously high expressed do not show up as reversal pairs thus separating them from lineage-specific high expressed genes.

Finding the top reversal pairs for a specific lineage split

A supervised search for candidate toggle gene pairs was formulated by setting criteria based on biological knowledge of lineage relationships and expected reversal pattern of such a gene pair in the precursor (P), lineage 1 (L1) and lineage 2 (L2) cells. An external (E) group corresponds to cell types outside the lineage split. The search was performed to extract the top pairs of the erythroid-myeloid and B-T lymphoid splits.

Erythroid-myeloid

The hematopoetic stem cell was selected as the precursor cell type (P), L1 has three erythroid (proerythroid, erythroblast, erythrocyte), L2 five myeloid (promyeloid, CD11b+ bone marrow cell, monocyte, CD16+ monocyte and neutrophil) cell types included, and three cell types from the lymphoid lineage (naive CD4+ T cell, naive CD8+ T cell and naive B cell) were selected as an external (E) group.

B-T lymphoid

The hematopoetic stem cell served again as the precursor cell type (P), L1 has four B-lymphoid (naïve B cell, activated B cell, germinal center centrocyte and centroblast), L2 four T-lymphoid (naive CD4+ T cell, activated CD4+ T cell, naive CD8+ T cell and activated CD8+ T cell) cell types included, and the proerythroid and promyeloid cell types were selected as an external (E) group. We expect no reversals (Δ = 0) in the P-L1, P-L2, P-E, L1-E and L2-E comparisons and always a reversal in all L1-L2 comparisons (Δ < 0 for each L1 vs L2 and Δ > 0 for each L2 vs L1, or Δ > 0 for each L1 vs L2 and Δ < 0 for each L2 vs L1). The exact match is the first filter to find candidate pairs. (The external group can be omitted, but is useful if pairs that do not exhibit expression reversals in neighboring lineages should be excluded.) Additionally, at least one reversal with |Δ | > 1 is required to accept a candidate gene pair to the final list shown. Supplementary Table 7 shows additional results when one or more of these criteria are relaxed. Invariantly, the top pairs presented are among the most promising candidates. Finally, the hypergeometric probability to obtain a defined set of reversals was calculated for each pair and used to sort the gene pairs. To calculate this distribution, the number of successes in the sample corresponds to the observed reversals within the specified cell type set, the number of successes in the population to the observed reversals across all cell type comparisons and the sample size to the number of cell types assigned to P, L1, L2 and E.

Clustering of cell types

We define a similarity measure based on gene pair expression reversals, Φ, as the number of reversal pairs with | Δ | ≥ 1 (as defined above) for a given cell type comparison. By examining all possible pairs of TFs in our dataset we can count the number of reversal pairs {g, g′} between two cell types (X, Y). Then, the greater the number of reversal pairs, the greater the similarity Φ(X,Y) between the two cell types. The cell lineage was reconstructed using hierarchical clustering with average linkage for the endothelial and hematopoietic cell types. Clustering was applied to terminally differentiated cell types. The hematopoietic and endothelial cells are closely related in early development. A hemangioblast cell type is a progenitor for both hematopoietic and endothelial precursors[42]. In the clustering, we do not have the common precursor cell type present, nor a precursor for endothelial differentiation. Therefore, all endothelial cells are assigned as differentiated cell types. The hematopoietic cell is the common precursor of the blood cell types and placed to the center. There are three early precursor cell types for the erythroid-myeloid lineage: erythroblast, bone marrow promyelocyte and CD11+ cells. In addition, we chose to assign monocyte as a precursor cell type as the data set contains multiple monocyte-derived cell types (macrophages and dendritic cells). There is no early lymphoid precursor in the data set. We chose to assign the naive cell types as precursors. For the B-cell lineage a further maturation step occurs in the germinal centers[43]. For this reason, the germinal center centrocyte and centroblast were assigned as precursors. The other cell types were considered to represent a differentiated state. The placement of the progenitor cell types {B}, where M is the number of progenitor cell types was done using Hungarian algorithm (HA)[44] to solve an assignment problem: Let X = {Φ(a1,B),…,Φ(a)} and Y = {Φ(b1,B),…,Φ(b)} contain the similarities Φ from progenitor cell type B to the cell types on the left {a} and right {b} branch of the node n, n ∈ {1,…,N} respectively and where N is the number of branches in the clustering tree. Here, k and l is the number of cell types in the left and right branches, respectively. Similarity D(n, B of cell type B from node n is defined as D(n,B = |mean {X} − mean{Y}|, where |.| denotes absolute value and mean{.} denotes the mean value from a set of similarities. The obtained similarity matrix D, containing D(n, B for all the node and cell type pairs is then scaled by the similarity to the ESC from each progenitor cell type type D = D·desc, where d = [Φ(A,B),…, Φ(A, B)] and A is the ESC. d is normalized to the [0,1] interval. This makes the ESC a reference point. HA is then applied on D to obtain the optimal assignment for each progenitor cell type. It should be noted that there are more nodes in the clustering tree than there are progenitor cell types with measurement data. Thus, a progenitor cell type is assigned only to best fitting nodes according to HA optimization. For a representation containing all 166 cell types, multidimensional scaling was used to obtain a two-dimensional representation of the full reversal similarity matrix. A landscape is interpolated over the 2D representation of cell types using the similarity Φ to the ESC as elevation.

ChIP-seq data

The ChIP-seq datasets used are listed in Supplementary Table 10 and their use is further described in Supplementary Results. The peak lists as published by the authors were assembled for each TF. The peak sizes were equalized to +/− 250 bp around the peak centre. For the ESC data, overlapping intervals representing the binding of the same protein were merged into one. The intersection of peak lists between pairs of TFs was defined as a minimum 1 bp overlapping region. The genomic region enrichment analysis was performed using the GREAT[45] tool (binomial test, FDR 1%).

Online resource

The online data resource and interactive tool (http://trel.systemsbiology.net/) encompassing pair-wise comparisons of the genes and cell types presented in this article is available to explore transcriptome diversity in metazoa, accompanied by a user guide and video tutorial. The TF landscape is also available as an interactive browsable format online. The source code to perform the analysis is available upon request.
  42 in total

Review 1.  Understanding gene circuits at cell-fate branch points for rational cell reprogramming.

Authors:  Joseph X Zhou; Sui Huang
Journal:  Trends Genet       Date:  2010-12-14       Impact factor: 11.639

Review 2.  Germinal-center organization and cellular dynamics.

Authors:  Christopher D C Allen; Takaharu Okada; Jason G Cyster
Journal:  Immunity       Date:  2007-08       Impact factor: 31.745

3.  Genome-wide analyses of transcription factor GATA3-mediated gene regulation in distinct T cell types.

Authors:  Gang Wei; Brian J Abraham; Ryoji Yagi; Raja Jothi; Kairong Cui; Suveena Sharma; Leelavati Narlikar; Daniel L Northrup; Qingsong Tang; William E Paul; Jinfang Zhu; Keji Zhao
Journal:  Immunity       Date:  2011-08-26       Impact factor: 31.745

4.  Negative cross-talk between hematopoietic regulators: GATA proteins repress PU.1.

Authors:  P Zhang; G Behre; J Pan; A Iwama; N Wara-Aswapati; H S Radomska; P E Auron; D G Tenen; Z Sun
Journal:  Proc Natl Acad Sci U S A       Date:  1999-07-20       Impact factor: 11.205

5.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

6.  Simple decision rules for classifying human cancers from gene expression profiles.

Authors:  Aik Choon Tan; Daniel Q Naiman; Lei Xu; Raimond L Winslow; Donald Geman
Journal:  Bioinformatics       Date:  2005-08-16       Impact factor: 6.937

7.  Cloning and functional characterization of early B-cell factor, a regulator of lymphocyte-specific gene expression.

Authors:  J Hagman; C Belanger; A Travis; C W Turck; R Grosschedl
Journal:  Genes Dev       Date:  1993-05       Impact factor: 11.361

8.  Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas.

Authors:  Nathan D Price; Jonathan Trent; Adel K El-Naggar; David Cogdell; Ellen Taylor; Kelly K Hunt; Raphael E Pollock; Leroy Hood; Ilya Shmulevich; Wei Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2007-02-21       Impact factor: 11.205

Review 9.  Transcriptional regulatory networks in haematopoiesis.

Authors:  Diego Miranda-Saavedra; Berthold Göttgens
Journal:  Curr Opin Genet Dev       Date:  2008-10-07       Impact factor: 5.578

10.  Progressive lineage analysis by cell sorting and culture identifies FLK1+VE-cadherin+ cells at a diverging point of endothelial and hemopoietic lineages.

Authors:  S I Nishikawa; S Nishikawa; M Hirashima; N Matsuyoshi; H Kodama
Journal:  Development       Date:  1998-05       Impact factor: 6.868

View more
  72 in total

1.  Panning data for gold: the search for master regulators of cell fate.

Authors:  Vanguel Trapkov; Matthias Stadtfeld
Journal:  Nat Methods       Date:  2013-06       Impact factor: 28.547

2.  Changing one cell type into another.

Authors:  Natalie de Souza
Journal:  Nat Methods       Date:  2013-06       Impact factor: 28.547

3.  Overexpression of SNORD114-3 marks acute promyelocytic leukemia.

Authors:  T Liuksiala; K J Teittinen; K Granberg; M Heinäniemi; M Annala; M Mäki; M Nykter; O Lohi
Journal:  Leukemia       Date:  2013-08-27       Impact factor: 11.528

4.  NetExplore: a web server for modeling small network motifs.

Authors:  Dmitri Papatsenko; Ihor R Lemischka
Journal:  Bioinformatics       Date:  2015-01-30       Impact factor: 6.937

5.  Development and validation of an immune-related gene pairs signature in colorectal cancer.

Authors:  Jianping Wu; Ying Zhao; Juanwen Zhang; Qianxia Wu; Weilin Wang
Journal:  Oncoimmunology       Date:  2019-04-15       Impact factor: 8.110

6.  Direct reprogramming of fibroblasts into renal tubular epithelial cells by defined transcription factors.

Authors:  Michael M Kaminski; Jelena Tosic; Catena Kresbach; Hannes Engel; Jonas Klockenbusch; Anna-Lena Müller; Roman Pichler; Florian Grahammer; Oliver Kretz; Tobias B Huber; Gerd Walz; Sebastian J Arnold; Soeren S Lienkamp
Journal:  Nat Cell Biol       Date:  2016-11-07       Impact factor: 28.824

Review 7.  Statistical mechanics meets single-cell biology.

Authors:  Andrew E Teschendorff; Andrew P Feinberg
Journal:  Nat Rev Genet       Date:  2021-04-19       Impact factor: 53.242

8.  Cellular reprogramming dynamics follow a simple 1D reaction coordinate.

Authors:  Sai Teja Pusuluri; Alex H Lang; Pankaj Mehta; Horacio E Castillo
Journal:  Phys Biol       Date:  2017-12-06       Impact factor: 2.583

Review 9.  Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders.

Authors:  C Gaiteri; Y Ding; B French; G C Tseng; E Sibille
Journal:  Genes Brain Behav       Date:  2013-12-10       Impact factor: 3.449

Review 10.  Control of cancer formation by intrinsic genetic noise and microenvironmental cues.

Authors:  Amy Brock; Silva Krause; Donald E Ingber
Journal:  Nat Rev Cancer       Date:  2015-07-09       Impact factor: 60.716

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.