The ability of a transcription factor (TF) to regulate its targets is modulated by a variety of genetic and epigenetic mechanisms, resulting in highly context-dependent regulatory networks. However, high-throughput methods for the identification of proteins that affect TF activity are still largely unavailable. Here we introduce an algorithm, modulator inference by network dynamics (MINDy), for the genome-wide identification of post-translational modulators of TF activity within a specific cellular context. When used to dissect the regulation of MYC activity in human B lymphocytes, the approach inferred novel modulators of MYC function, which act by distinct mechanisms, including protein turnover, transcription complex formation and selective enzyme recruitment. MINDy is generally applicable to study the post-translational modulation of mammalian TFs in any cellular context. As such it can be used to dissect context-specific signaling pathways and combinatorial transcriptional regulation.
The ability of a transcription factor (TF) to regulate its targets is modulated by a variety of genetic and epigenetic mechanisms, resulting in highly context-dependent regulatory networks. However, high-throughput methods for the identification of proteins that affect TF activity are still largely unavailable. Here we introduce an algorithm, modulator inference by network dynamics (MINDy), for the genome-wide identification of post-translational modulators of TF activity within a specific cellular context. When used to dissect the regulation of MYC activity in human B lymphocytes, the approach inferred novel modulators of MYC function, which act by distinct mechanisms, including protein turnover, transcription complex formation and selective enzyme recruitment. MINDy is generally applicable to study the post-translational modulation of mammalian TFs in any cellular context. As such it can be used to dissect context-specific signaling pathways and combinatorial transcriptional regulation.
Reverse engineering of cellular networks in prokaryotes and lower eukaryotes1–3, as well as in mammals4–6, has started to unravel the remarkable complexity of transcriptional programs. These programs, however, may change substantially as a function of the availability of proteins affecting their post-translational modification, such as phosphorylation, acetylation and ubiquitination enzymes7, as well as of those participating in transcription complexes (co-factors), thus making cellular networks highly context dependent.Although the large-scale reprogramming of the cell’s transcriptional logic was studied in yeast8, 9, identification of a repertoire of genes that effect these events remains elusive. Indeed, compared to tools such as ChIP-on-Chip or reverse engineering algorithms for the analysis of transcriptional networks4, 10, only one experimentally validated algorithm exists for the dissection of signaling networks in a mammalian context11, which inferred substrates of 73 kinases. Here we propose and experimentally validate MINDy (Modulator Inference by Network Dynamics), a gene expression profile based method for the systematic identification of genes that modulate a transcription factor’s (TF) transcriptional program at the post-translational level, i.e. those encoding proteins that affect the TF’s activity without changing its mRNA abundance. These proteins may post-translationally modify the TF (e.g., kinases), affect its cellular localization or turnover, be its cognate partners in transcriptional complexes, or compete for its DNA binding sites. They may also include proteins that do not physically interact with the TF, such as those in its upstream signaling pathways.
RESULTS
The MINDy algorithm
MINDy interrogates a large gene expression profile dataset to identify candidate modulator genes whose expression strongly correlates with changes in a TF’s transcriptional activity. As shown in Supplementary Information (SI) Section 1.1, this can be efficiently accomplished by computing an information theoretic measure known as the Conditional Mutual Information (CMI), I[T F; t| M] 12, between the expression profile of a transcription factor (TF) and one of its putative targets (t), given the expression of a modulator gene (M). Accurate estimation of the CMI requires exceedingly large datasets. Thus MINDy infers candidate modulators using a related, yet simpler estimator, which we denote as ΔI (see Methods). Briefly, the estimator assesses the statistical significance of the difference in Mutual Information (MI) between the TF and a target in two subsets, each including 35% of samples, in which the modulator is least and most expressed, respectively. The 35% parameter was determined empirically as the one optimizing the identification of proteins in the B-cell receptor signaling pathway as modulators of the MYC TF (see Methods).A schematic representation of the MINDy algorithm is provided in Figure 1A. MINDy takes four inputs: a gene expression profile dataset, a TF of interest, a list of potential modulator genes (M1, M2,…) and a list of potential TF targets (t1,t2,…). The ΔI estimator requires that the expression of the modulator and of the TF be statistically independent (independence constraint), and that the modulator expression have sufficient range (range constraint). Appropriate statistical tests for these constraints are discussed in the Methods. Candidate modulators may include all genes satisfying these constraints (unbiased analysis) or may be filtered by additional criteria, e.g., their molecular functions. Each possible triplet (TF, M, t) is then independently tested using the ΔI estimator. False positives are controlled using appropriate statistical thresholds (see Methods).
Figure 1
MINDy algorithm
(A) A gene expression profile dataset is represented as a matrix, where columns indicate different samples and rows indicate different genes. Given a transcription factor of interest, TF, a modulator and a target to be tested by MINDy (M and t respectively) are chosen among the remaining genes. Some modulator-target combinations may be eliminated a priori based on functional or statistical constraints (blue rectangle). For instance, one may want to consider only ubiquitin ligases as candidate modulators. All the samples are then sorted according to the expression of the selected modulator M. The set of samples (e.g., 35%) with the lowest and highest expression of the modulator are then selected. These sample sets are labeled M-low and M-high. In each of the two sample sets, samples are finally sorted according to the TF expression. Three cases are possible when comparing the TF-target correlation in M-high vs. M-low. (Positive Inferred Modulator): Significant increase in MI (i.e. more correlation in the M-high set than in the M-low set); (Negative Inferred Modulator): Significant decrease in MI (i.e. less correlation in the M-high set than in the M-low set); (Not a Modulator): No significant MI change is observed.
(B) Synthetic network analysis using MINDy. TF: the transcription factor of interest; K: a protein kinase; cTF: a co-transcription factor;
: activated TF through phosphorylation by K;
: transcriptionally active complex formed between TF and cTF; G, G and G are three downstream targets of TF. More details on the synthetic network are provided in the Methods.
(C) Networks reconstructed by ARACNe and MINDy. ARACNe (left) is based on pairwise MI, thus captures only the unconditional interaction between TF and G1. MINDy, on the other hand, infers three-way interactions using the CMI. When K is used as candidate modulator (middle), it correctly identified the conditional interaction between TF and G3; similarly, it also correctly inferred cTF as the modulator of TF-G2 interaction (right). Interactions are labeled with their respective p-values. Interactions that are statistically significant, after correction for multiple testing, are shown in black with red p-values.
A positive or negative Mode of Action is determined, depending on whether the TF-target MI increases or decreases as a function of the modulator abundance (Figure 1A). The Mode of Action, however, is not necessarily equivalent to the modulator’s Biological Activity as a TF activator or antagonist. For instance, a modulator may be such a strong TF-activator that the TF-target kinetics becomes saturated even when the TF is slightly activated. In that case, TF-target MI may decrease as a function of the modulator. Details on the CMI analysis and on how to assess both Mode of Action and Biological Activity of a modulator are provided in the Methods section.For illustrative purposes, we show a simple synthetic network (Figure 1B, see Methods), which explicitly models two post-translational modulation events (activation by phosphorylation and co-factor binding) differentially affecting a TF’s regulatory logic. Rather than representing a realistic case this model is only a conceptual tool to illustrate two alternative regulatory programs of a TF depending on its modulators (Figure 1C).
MINDy-based identification of MYC modulators
We applied MINDy to the genome-wide identification of modulators of the MYC proto-oncogene, using a previously assembled collection of 254 gene expression profiles13, 14, representing 17 distinct cellular phenotypes derived from normal and neoplastic human B lymphocytes (see Methods). MYC encodes a basic helix-loop-helix/leucine zipper (bHLH/Zip) TF, controlling many cellular processes, including cell growth, differentiation, apoptosis and DNA replication15, 16. It is implicated in the pathogenesis of several human cancers17 and can either activate or repress a large number of targets, depending on the cellular context (for a review see [18]).To study MINDy’s robustness and generality, we tested its performance using different MYC-target selections. First, we used 340 literature-validated targets from the MYC database19 (DB-targets). Then, to also test whether the algorithm may generalize to TFs whose targets are not characterized in the literature, we used 197 MYC targets inferred by ARACNe4 (AR-targets). Finally, see SI section 1.9, we considered all genes in the gene expression profile data as candidate targets (ALL-targets), representing cases when literature or computationally-inferred TF targets are not available.
MINDy identifies known MYC modulators
From a pool of 3,131 genes satisfying both independence and range constraints and using DB-targets, MINDy inferred 662 MYC modulators (Table S1) at a False Discovery Rate (FDR) of 4.5×10−3 (see Methods). Gene Ontology (GO) molecular function enrichment analysis revealed that the 20 most enriched categories by Fisher’s exact test (FET) include protein kinase activity (p = 0.002), transcription factor binding (p = 0.003), acetyltransferase activity (p = 0.004) and phosphoprotein phosphatase activity (p = 0.016). Thus, inferred modulators were enriched in categories known to include effectors of MYC activity20–22 (see Methods and Table S2 for details on the GO enrichment analysis).To test whether MINDy could recapitulate literature-based MYC modulators, we assembled an unbiased set of 233 MYC modulators, including both proteins physically interacting with MYC and indirect modulators (Table S3), using the Ingenuity software (Ingenuity® Systems, www.ingenuity.com). The assumption is that physical interactors are likely to affect MYC protein function, although there may be false positives. While not exhaustive, this provides an independently-established, literature-based dataset to assess algorithm performance (see SI Section 1.8 for details on inclusion criteria).From this set, 150 genes were excluded as not represented on the chip, not expressed in B cells or not satisfying the range or independence constraints. Of the remaining 83, 29 (35%) were inferred as MYC modulators by the algorithm (p = 0.0078 by FET). This suggests that the algorithm is effective in recapitulating known MYC modulators (especially since Ingenuity modulators may not be B cell specific) with recall comparable to high-throughput assays for protein-protein interactions, which on average detect 20% of known interactions23. We note that 54/83 proteins were reported by Ingenuity as MYC-modulators not supported by a direct physical interaction. Of these, MINDy identified 18 (33.3%, p = 0.041 by FET), suggesting that MINDy is useful in identifying both physically interacting and indirect TF-modulators. Indeed, almost twice as many modulators (18 vs. 11) were found by MINDy among proteins not known to interact directly with MYC.To focus our analysis on specific molecular functions, we restricted candidate modulators to 542 signaling proteins – including protein kinases, phosphatases, acetyltransferases and deacetylases – and to 598 TFs (see Methods). Among these, MINDy identified 91 signaling proteins (Table S4) and 99 TFs (Table S5), respectively, as MYC modulators (FDR = 0.0053). For each modulator, virtually 100% of the ΔI tests inferred the same Mode of Action (see columns “T+” and “T−” in Table S4 and S5) and fewer than 15% of the modulators had an ambiguous Biological Activity.To assess a lower bound on the fraction of true positives among inferred modulators (i.e. precision), we performed manual literature curation. Given the labor-intensive nature of this step, only 29 signaling proteins and 35 TFs affecting more than 30 MYC targets were considered. Among the former (Table 1), 11 appear as MYC modulators in published reports (precision = 37.9%). Similarly, among the latter (Table 1), 6 appear in published reports as MYC co-factors or antagonists (precision = 17.1%). For TFs with informative binding profiles in TRANSFAC24, we tested whether their binding sites were enriched in promoters of the modulated targets (see Methods). 14 of 35 TFs had appropriate binding profiles. Of these, 11 were highly enriched. Overall, 17 of 35 TFs (precision = 48.6%) were either literature-validated or had enriched binding sites (see Table 1 and Table S6). This suggests that MINDy’s precision may approach experimental assays23 when all modulators will be experimentally tested.
Table 1
Top modulators of MYC inferred by MINDy among signaling proteins and TFs.
Signaling Protein
Transcription Factor
Modulator
#T
MoA
BA
Evidence
Modulator
#T
MoA
BA
Evidence
BSE p-value
CSNK2A1
178
↑
+
Direct41, 42
SMAD3
159
↓
−
Direct43
-
HCK
111
↓
−
Pathway42, 44
AHR
156
↓
+
Direct45
-
PPAP2B
109
↓
−
CREM
125
↓
+
1×10−36
SAT
99
↓
−
DDIT3
96
↓
−
0.04
MAP4K4
83
↓
+
Pathway46, 47
DRAP1
92
↑
−
-
DUSP2
82
↓
−
Pathway48, 49
ZKSCAN1
87
↑
−
-
CSNK1D
80
↓
−
NR4A1
82
↓
−
-
PPM1A
80
↓
+
BHLHB2
80
↓
−
Validated
9×10−58
GCAT
78
↑
+
ATF3
77
↓
+
1×10−17
TRIO
74
↓
−
NR4A2
72
↓
−
-
STK38
60
↑
−
Validated
UBTF
71
↑
−
-
PRKCI
58
↑
+
NFKB2
66
↓
−
0.70
CDKN1A
58
↓
−
Direct50
HOXB7
65
↑
−
-
MTMR6
52
↓
+
BACH1
64
↓
−
5×10−9
PRKACB
50
↓
+
SOX5
64
↓
+
0.66
NEK9
47
↑
−
FOS
57
↓
−
2×10−3
MYST1
45
↑
−
ARNT
55
↑
−
Direct45
-
MAPK13
41
↑
−
Pathway51, 52
IRF1
55
↓
−
6×10−8
OXSR1
41
↓
+
ETV5
54
↓
+
-
DUSP4
40
↓
−
TCF12
51
↑
−
2×10−3
MAP2K3
39
↓
−
Pathway53
SMAD2
50
↑
−
Direct43
-
FYN
38
↓
−
Pathway42, 54
NFATC4
46
↑
−
-
PPP4R1
35
↓
+
E2F5
45
↑
−
Direct55
-
MAPK1
35
↑
−
Direct22, 56
JUN
45
↓
−
1×10−7
MAP4K1
34
↓
u
Pathway57
CUTL1
39
↓
+
-
CSNK1E
33
↑
−
ESR2
38
↓
+
-
NEK7
32
↑
−
ZNF354A
37
↑
−
-
CSNK2A2
31
↑
+
Direct41
MAF
37
↓
+
-
TRIB2
30
↓
+
SMARCB1
37
↑
−
Direct58
-
HDAC1
8
↑
+
Validated
BRD8
34
↑
+
-
DBP
34
↑
−
6×10−3
CBFA2T3
33
↑
−
-
ESR1
31
↑
−
0.04
RELB
31
↓
−
-
TFEC
30
↓
u
0.92
MEF2B
14
↑
+
Validated
Modulators on each side are sorted in decreasing order based on the number of MYC targets they modulate. #T: total number of MYC targets affected by the modulator; MoA - MINDy Mode of Action: MI increase (↑), MI decrease (↓); BA - Biological Activity: activator (+), antagonist (−), undetermined (u) (see Methods). Evidence: literature or experimental support: “Direct” (blue colored) denotes modulators physically interacting with MYC protein or MYC binding sites; “Pathway” (green colored) indicates literature validated role in pathway-mediated (e.g. B-cell receptor and mitogen-activated protein kinase pathways) modulation of MYC activity; “Validated” (red colored) indicates modulators experimentally validated in this study. BSE: Binding site enrichment analysis. P-values in red indicate statistical significance (p < 0.05) in BSE.
We then compared the performance of MINDy using literature-based targets (DB-targets) to that with targets computationally-inferred by ARACNe (AR-targets). Overlap between the two target sets was highly significant (p = 2.89×10−18) but relatively small (17%). Nonetheless, overlap between MINDy inferred modulators, using the two target sets, was almost complete: 93.8% (p = 3.10×10−27) among signaling proteins and 95.3% (p = 4.56×10−288) among TFs, respectively (See SI Section 1.9 for more details). This suggests that the method is highly robust to target selection and can be effectively generalized to TFs whose targets are not known from the literature but can be inferred computationally.
Experimental Validation
We selected four candidates among signaling genes and co-TFs for experimental validation, including a kinase (STK38), two TFs (BHLHB2 and MEF2B) and a de-acetylating enzyme (HDAC1). These genes were selected based on the availability of reagents allowing the thorough validation of their activity, and the diversity of the possible mechanisms by which they may modulate MYC activity. Since no single B-cell line expressed more than two of the four tested modulators, appropriate lines were selected for the silencing experiments among those with the highest modulator expression.
STK38
As a first candidate, we validated STK38, a Serine/Threonine kinase25 inferred by MINDy as a strong positive modulator of MYC activity, affecting 60 targets (Figure 2A, Table S4). We silenced STK38 by lentiviral vector-mediated shRNA expression in ST486 cells. While qRT-PCR analysis showed that MYC mRNA concentration was unchanged following STK38 silencing, MYC protein levels were significantly affected (Figure 2B), suggesting a post-translational modulation effect. We proceeded to test two MYC targets in MINDy predictions, BCL2 and NME1, which are normally repressed by MYC26, 27. Consistently with what MINDy predicted, both genes were up-regulated following STK38 silencing (Figure 2C). Additionally, to test whether STK38-mediated MYC modulation is target specific, we tested three additional MYC targets not inferred by MINDy. The first two, TERT and EBNA1BP2, which are known to be activated by MYC4, 28, were down-regulated following STK38 silencing, while the third one, p21CIP1, which is known to be repressed by MYC29, was up-regulated (Figure 2C). These results confirm the observed target-independent down-regulation of MYC at the protein level. Furthermore, STK38-mediated modulation of MYC affected its phosphorylation. Immunoblot analysis of MYC protein in ST486 cells, in the presence of inhibitor of proteasomal degradation (MG132), showed accumulation of phosphorylated MYC in cells following STK38 silencing compared to cells treated with control shRNA, suggesting that STK38 mediates MYC phosphorylation (Figure 2D). Finally, co-immunoprecipitation (co-IP) of epitope-tagged STK38 (HA-STK38) and MYC (FLAG-MYC), in 293T cells transfected with vectors expressing both proteins, showed that STK38 and MYC interacts at the protein level (Figure 2E). Immunoprecipitation of endogenous MYC using specific antibodies in Ramos cells confirmed that the two proteins can interact physiologically in native cells (Figure 2F). These results suggest that STK38 modulates MYC activity by directly affecting MYC protein stability.
Figure 2
STK38 regulates the stability of MYC protein
(A) Visualization of MINDy output. Gene expression profiles were displayed with genes on rows and samples on columns. Expression values for each gene are rank transformed, median centered and rescaled between [−1, 1]. Samples were partitioned based on the expression levels of STK38, and within each partition, sorted by the expression levels of MYC.
(B) ST486 cells were infected with lentivirus expressing shRNA specific to STK38 and collected 96 hours post-infection. Whole cell extracts were analyzed by Western blotting using anti-MYC, anti-STK38/NDR1, and anti-β-Actin antibodies. Representative results from two of five independent experiments are shown.
(C) qRT-PCR analysis of MYC and MYC target genes expression after silencing of STK38 in ST486 cells. Relative expression fold change between STK38 shRNA and control shRNA transduced cells was normalized to that of β-Actin housekeeping gene. Bars represent mean ± standard error (SE) of three different samples.
(D) STK38 mediates MYC phosphorylation. Top graph shows accumulation of phosphorylated MYC in STK38 silenced cells in the presence of proteasomal degradation inhibitor MG132 (48 hours post-infection). ST486 cells were infected with lentivirus expressing STK38 shRNA or control shRNA. The cells were treated with or without MG132 for 4 hours before harvesting. Whole cell extracts were analyzed by Western blotting using anti-STK38, anti-phospho-MYC (Thr58/Ser62), anti-MYC and anti-GAPDH antibodies. Bottom plot provides detailed densitometry histogram. Relative ph-MYC expressions (i.e. ratio of ph-MYC to total MYC) are normalized to that of cells transduced with control shRNA.
(E) HA-STK38 expression vector was transiently transfected into 293T cells with FLAG-MYC expression vector. At 24-hour post-transfection immunoprecipitation was performed using the anti-HA agarose beads. Whole cell extracts and immunoprecipitated proteins were analyzed by Western blotting with anti-FLAG and anti-HA antibodies.
(F) Nuclear extracts from Ramos cells were immunoprecipitated with anti-MYC antibody or mouse IgG as a control. The precipitates were analyzed by Western blotting with anti-STK38/NDR1, anti-MYC, and anti-MAX antibodies.
BHLHB2
MINDy infers this TF as a negative modulator of MYC activity, affecting the regulation of 80 targets (Figure 3A, Table S5). Indeed, BHLHB2 is a TF able to bind to E-boxes through its bHLH domain, and it has been proposed to act as a transcriptional repressor30, but not validated as a MYC antagonist. Thus, we tested whether BHLHB2 could antagonize MYC-mediated transcriptional activation of its target genes, by first investigating whether BHLHB2 could affect the transcriptional activation of TERT, a well characterized MYC target28. Transient co-transfection of a reporter gene driven by a segment of the humanTERT promoter region, carrying two E-boxes [TERT-Luc80028], and vectors encoding MYC or BHLHB2 in 293T cells showed that BHLHB2 represses MYC-mediated transcriptional activation on TERT in a dose-dependent manner. This effect is MYC-dependent since the basal transcriptional activity of the reporter gene is actually mildly increased by BHLHB2 (1.2 fold, Figure 3B). Thus BHLHB2 represses MYC-mediated regulation but is not a direct repressor of the TERT promoter. We next analyzed whether endogenous BHLHB2 molecules are physiologically bound to E-boxes within the promoter region of MYC target genes in vivo by quantitative chromatin immunoprecipitation (qChIP) assays in B-cells (ODH-III) using antibodies against MYC and BHLHB2. The results showed that the promoters of BOP1, ATIC, MRPL12, EBNA1BP2 and TERT were bound by both MYC and BHLHB2 (Figure 3C). In order to establish the functional significance of BHLHB2-mediated modulation of MYC transcriptional activity, we examined whether shRNA-mediated inhibition of BHLHB2 could affect the response of the 340 canonical MYC-target genes used in the MINDy analysis or, more specifically, of the MYC-target genes modulated by BHLHB2 as inferred by MINDy. The latter signature was used in case the effect may be highly target specific and thus only a subset of MYC-target may be affected by BHLHB2 silencing. To this end, ODH-III cells were transduced with lentiviral vectors expressing BHLHB2-specific or control shRNAs. Western blot analysis showed that BHLHB2 was effectively silenced, while MYC levels were not affected (Figure S5). We then performed gene expression profile analysis to assess the effect of BHLHB2 silencing on the expression of MYC targets. MYC is known to both positively and negatively regulate its targets31. Thus, without prior knowledge of MYC specific activity on each target, we used Gene Set Enrichment Analysis (GSEA)32 to assess whether MYC target signature genes are more differentially expressed than non-MYC target genes following BHLHB2 silencing (see Methods). The analysis confirmed a highly significant enrichment of canonical MYC targets within differentially expressed genes (p< 0.001) (Figure 3D). Among the 80 MINDy inferred BHLHB2-modulated MYC-targets, 30 were among the most differentially expressed genes. This constitutes approximately a 2-fold increase over their enrichment in non-differentially expressed genes (FET p= 8×10−5). MYC mRNA and protein levels were not affected (data not shown and Figure S5) indicating a post-translational effect. These results validate BHLHB2 as an antagonist of MYC activity.
Figure 3
BHLHB2 antagonizes MYC activity in B-cells
(A) Visualization of MINDy output. See Figure 2A for interpretation of this graph.
(B) TERT promoter-driven luciferase reporter construct, TERT-800Luc, was transiently co-transfected into 293T cells with MYC plasmid and three different doses of BHLHB2-FLAG plasmids. pRL-TK plasmid was used as internal control to monitor the transfection efficiency. The luciferase activities were measured 48 hours post transfection and normalized against the Renilla activity. Each experiment was done in duplicate and data were shown as mean ± SE of three independent experiments.
(C) qChIP assays on ODH-III cells were performed in parallel using equivalent number of cells. Chromatin was immunoprecipitated with anti-MYC and anti-BHLHB2/DEC1 antibodies or an irrelevant antibody (rabbit IgG) as a control. The precipitated DNA fragments were assessed by quantitative PCR and shown as mean ± SE.
(D) GSEA enrichment analysis for BHLHB2 silencing. Gene expression profiles were generated for 5 samples of ODH-III cells transduced with BHLHB2-specific or control shRNA using a lentiviral vector. The x-axis represents all probes on the microarray, ranked by their absolute differential expression in the cell transduced with the BHLHB2 shRNA vs. the control shRNA. Most differentially expressed genes are towards the left. Sorting was based on the value of −log10 of their differential expression p-values. Blue vertical bars represent all 340 genes in the MYC target signature, while the red bars represent the 80 MINDy-inferred, BHLHB2-modulated subset signature. Color intensity is proportional to the density of the bars. Rank of BHLHB2 among differentially expressed genes is shown by a tick mark. (See Methods for details on GSEA test and statistical significance assessment.)
: this TF was inferred as a positive modulator of MYC activity, affecting 14 MYC targets (Figure 4A, Table S5). MEF2B is a member of the MEF TF family, which interacts with the myogenic bHLH proteins MyoD and E12 to activate gene transcription through direct binding to E-boxes on target promoters33. Details on the validation assays are provided in SI Section 1.11. Briefly, similar to BHLHB2, we showed (a) that MEF2B physically interacts with MYC both exogenously in 293T cells (Figure 4B), and endogenously in P3HR1 and Ramos cells (Figure 4C), (b) that MYC and MEF2B can synergistically activate a TERT reporter gene (Figure 4D), and (c) that genes differentially expressed following shRNA-mediated silencing of MEF2B were highly enriched in MYC targets by GSEA (p< 0.001, Figure 4E), while MYCexpression was not affected (Figure S6B).
Figure 4
MEF2B enhances MYC activity via protein-protein interaction
(A) Visualization of MINDy output. See Figure 2A for interpretation of this graph.
(B) HA-MEF2B expression vector was transiently transfected into 293T cells with FLAG-MYC expression vector. At 48-hour post-transfection, immunoprecipitation was performed using the anti-FLAG agarose beads (M2). Whole cell extracts and immunoprecipitation eluates were analyzed by Western blotting with anti-FLAG, anti-HA, and anti-MAX antibodies.
(C) Nuclear extracts from P3HR1 cells were immunoprecipitated with anti-MEF2B serum or rabbit serum as a control. The precipitates were analyzed by Western blotting with anti-MEF2B and anti-MYC antibodies.
(D) TERT promoter-driven luciferase reporter construct, TERT-800Luc, was transiently co-transfected into 293T cells with MYC plasmid and three different amounts of HA-MEF2B plasmids. pRL-TK plasmid was used as internal control to monitor the transfection efficiency. The luciferase activities were measured 48 hours post-transfection and normalized against the Renilla activity. Each experiment was done in duplicate and data were shown as mean ± SE of three independent experiments.
(E) GSEA enrichment analysis for MEF2B silencing. Gene expression profiles were generated for 5 samples of P3HR1 cells transduced with MEF2B-specific or control shRNA using a lentiviral vector. See Figure 3D for further interpretation of this graph.
HDAC1
Finally, MINDy identified the histone deacetylase and well-known transcriptional co-repressor HDAC134, 35 as a modulator of MYC transcriptional activity on 8 MYC targets (Figure 5A, Table S4). Experiments (see SI Section 1.12 for more details) confirmed (a) that HDAC1 and MYC can interact in vivo, both exogenously in 293T cells (Figure 5B) as also reported in [36], and endogenously in Ramos and P3HR1 cells (Figure 5C), (b) that genes differentially expressed following shRNA-mediated silencing of HDAC1 were highly enriched in MYC targets by GSEA (p < 0.001, Figure 5D), while MYCexpression was not affected (Figure S7A), (c) that HDAC1 can deacetylate MYC in vitro, following its CBP-mediated acetylation (Figure 5E) which has been shown to increase MYC’s activity as a transcriptional activator, and (d) that, based on qChIP assays with anti-MYC and anti-HDAC1 specific antibodies, both MYC and HDAC1 bind to the promoters of p21CIP1 and CR2, which are repressed by MYC in B-cells (Figure 5F). These results suggested that MYC could recruit HDAC1 to repress transcription on selected target genes. Taken together, we have demonstrated that HDAC1 may modulate MYC activity both by de-acetylation of the MYC protein and by transcriptional repression of selected targets.
Figure 5
MYC selectively recruits HDAC1 to its targets as co-repressor
(A) Visualization of MINDy output. See Figure 2A for interpretation of this graph.
(B) FLAG-HDAC1 expression vector was transiently transfected into 293T cells with HA-MYC expression vector. At 48-hour post-transfection, immunoprecipitation was performed using the anti-FLAG agarose beads (M2). Whole cell extracts and immunoprecipitation eluates were analyzed by Western blotting with anti-FLAG and anti-HA antibodies.
(C) Nuclear extracts from Ramos and P3HR1 cells were immunoprecipitated with anti-MYC antibody or rabbit IgG as a control. The precipitates were analyzed by Western blotting with anti-HDAC1, anti-MYC and anti-MAX antibodies.
(D) GSEA enrichment analysis for HDAC1 silencing. Gene expression profiles were generated for 5 samples of P3HR1 cells transduced with HDAC1-specific or control shRNA using a lentiviral vector. See Figure 3D for further interpretation this graph.
(E) FLAG-MYC and CBP-HA expression vectors were transiently transfected into 293T cells to purify acetylated-MYC protein. HDAC1, which was also purified from transient transfected 293T cells, was incubated with acetylated-MYC protein in vitro. The signal intensities of acetylated-MYC and total amount of MYC were measured by the imageQuant 5.2 software. The densitometry histogram on the right shows the percent of acetylated-MYC normalized by total amount of MYC.
(F) qChIP assays on P3HR1 cells were performed in parallel using equivalent number of cells. Chromatin was immunoprecipitated with anti-MYC and anti-HDAC1 antibodies or an irrelevant antibody (rabbit IgG) as a control. The precipitated DNA fragments were assessed by quantitative PCR with CR2 and p21CIP1 primers and are shown as mean ± SE.
Extension to other Transcription Factors: to validate MINDy on a broader range of TFs, we used the algorithm to infer all TFs modulated by BHLHB2, MEF2B, HDAC1 and STK38, for which we had already collected gene expression profile data following shRNA-mediated silencing. Specifically, we tested whether their MINDy-inferred targets were enriched in differentially expressed genes (by GSEA), following lentivirus mediated shRNA silencing of the corresponding modulators. As shown in Table S13, 75% of the TFs inferred by MINDy as modulated by any of the four modulators with support greater than 100 targets (33 out of 44, p ≤ 5.1×10−34) could be experimentally validated by the analysis. Furthermore, as one may expect, validation rates increased – from 51% (87 out of 171), to 61% (59 out of 96), to 75% (33 out of 44) – when the minimum number of MINDy-inferred targets supporting the TF was increased from 25, to 50, to 100, respectively. In general, these results are consistent with the MYC analysis and suggest that MINDy is broadly applicable to the analysis of TFs other than MYC. (See SI Section 1.13 for further details).
DISCUSSION
We have introduced MINDy, a novel method for the identification of context-specific, post-translational modulators of TF activity. Literature-based and experimental validation suggests that MINDy can recapitulate a large fraction of known MYC modulators and infer novel, context-specific modulators, both of MYC and of other TF.For well studied TFs, targets for the analysis may be selected from the literature or by performing genome-wide ChIP assays37, 38. However, computationally-inferred targets performed as well or better than literature-based ones, likely due to their context-specific nature. About 269 TFs have more than 50 ARACNe-inferred targets using the B-cell profiles, and may thus be effectively analyzed by MINDy.Algorithm performance was remarkably robust to candidate target selection (DB-targets vs. AR-targets). Additionally, MINDy’s formulation is relatively simple, requiring only the availability of a large gene expression profile dataset (n ≥ 200 profiles) characterizing a sufficient variety of naturally occurring or experimentally perturbed cellular phenotypes. This suggests that MINDy can be used to analyze most TFs in a variety of cellular contexts.Several limitations should be noted. First, candidate modulators that do not satisfy the range constraint cannot be tested by the method. These, however, include mostly either genes that are not expressed or genes whose availability is so tightly regulated (e.g. housekeeping genes) that variability in the gene expression profiles is too limited to establish a low and a high range of expression. Second, candidate modulators that do not satisfy the independence constraint may not be tested using this approach. In practice, less than half (100/233 = 42.9%) of the Ingenuity modulators were in this category. This is not a theoretical limitation of the method but rather an assumption we use to increase its sensitivity. If desired, the more general test may be used without relying on the assumption I[TF; M] = 0 (see SI Section 1.1). Additionally, transcriptional modulators of MYC can be directly inferred by ARACNe and do not require MINDy. Third, in the rare event that the regulatory program of a TF changes from activation to repression for specific targets, as a function of a modulator, this may not be detected by the algorithm because the MI may not change substantially. In this case the multi-information could be used instead of the conditional.Comparison with existing methods reveals that MINDy is unique in its ability to discover large numbers of modulators of the same TF. Finding the optimal Bayesian Network structure, assuming arbitrary interactions among genes’ parents, for instance, is hyper-exponential in the number of parents. As a result, dissecting network topologies with large numbers (tens to hundreds) of upstream modulators, as is the case for MYC, may be difficult. Other methods39, while promising in a yeast context, have not yet been extended to mammalian networks. Finally, comparison with a recently introduced algorithm, NetworKIN11, shows that the latter is restricted to substrates of only 73 kinases, from 20 families, while MINDy can be used to dissect post-translational interactions of a much wider nature, including phosphorylation, acetylation, chromatin modification, formation of transcriptional complexes, and binding site antagonism, as shown for STK38, HDAC1, MEF2B and BHLHB2, just to mention those experimentally validated in this manuscript.The ability to infer direct and upstream modulators of a desired TF’s activity suggests that MINDy may provide highly specific pharmacological targets for the activation or repression of specific transcriptional programs, when modulators are restricted to druggable genes40. This could be valuable because TFs are generally considered difficult pharmacological target.Although preliminarily applied to the identification of modulators of MYC for experimental validation purposes, MINDy has already provided novel biological insights of general significance. First, the results indicate that not all modulators can influence the program of a TF in a global fashion, but may rather influence specific subsets of the target genes. This observation suggests that additional levels of regulation can influence the relationships between modulators and the TF they control in different cellular contexts or depending on different signals. Second, MINDy provided novel information about molecules that regulate the activity of the MYC protein. These mechanisms may be critically altered in tumors, thus modulating MYC’s established oncogenic activity. Finally, MINDy is not limited to dissecting post-translational interactions and may be applied without modification to identify TFs that are directly modulated by microRNAs or indirectly by genetic and epigenetic alterations.
Algorithm availability
At manuscript publication, the source code and executables for MINDy will be made available under the Open Source licensing agreement. Additionally, MINDy will be incorporated into the geWorkbench package which is distributed both by the NCI and by the National Center for Biomedical Computing at Columbia University (http://wiki.c2b2.columbia.edu/workbench/index.php/Home).
METHODS
Gene expression profile dataset
We used 254 gene expression profiles previously generated by our labs for several studies of normal and tumor related B-cell phenotypes using the Affymetrix HG-U95Av2 GeneChip® System (approximately 12,600 probes)13, 14. The gene expression profiles are available in GEO (series GSE2350, including samples GSM44075-44095, GSM44113-44302 and GSM65674-65716). Probe sets with expression mean μ < 50 and standard deviation σ< 0.3μ, were considered uninformative and were thus excluded, leaving 7907 probe sets for the analysis. Table S9 summarizes the 17 B-cell phenotypes included in this study.
Synthetic network
The synthetic network models two post-transcriptional modifications of a TF, affecting its regulatory behavior (Figure 1B). It includes the transcription factor (TF), an activating protein kinase (K), a co-factor (cTF) forming a transcriptionally active complex with the TF, and three downstream TF targets. The full kinetic model is described in Table S10. 250 synthetic expression profiles were generated from this model by: (a) randomly sampling the TF, K, and cTF mRNA concentration from independent Normal distributions (μ = 4 and σ =1), (b) simulating network dynamics until a steady state was reached, and (c) measuring the mRNA concentration of the represented species with a multiplicative Gaussian noise (μ = 0 and σ = 0.1μ).
Candidate modulators
The statistical test for the range constraint is defined in SI Section 1.3. The statistical significance test for the independence constraint is based on the Mutual Information, as described in [59] (see also SI Sections 1.5 and 1.6).For the category-specific analysis, we further selected 542 signaling proteins (GO molecular function: “protein kinase activity”, “phosphoprotein phosphatase activity”, “acetyltransferase activity” and “deacetylase activity”) and 598 TFs (GO molecular category: “transcription factor activity”) as candidate modulators.
MINDy inference
Given a triplet (TF, M, t), with t ≠ TF and t ≠ M, MINDy assesses whether the CMI, I[T F; t| M], is constant as a function of M. Assuming that the CMI is a monotonic function of M (see SI Section 1.10), this can be efficiently tested by measuring
, where
and
represent two subsets of the samples where M is respectively most and least expressed. For this dataset
is chosen to be 35% of the samples by optimizing the effect of B-cell receptor pathway genes (which is known to modulate MYC activity42) on regulating canonical MYC target genes (see SI Section 1.2). MI was computed using the Gaussian Kernel estimator of [59] (see also SI Section 1.5). The p-value corresponding to a specific ΔI is obtained by permutation tests (see SI Section 1.1), Bonferroni corrected for the total number of tested target-modulator pairs. If a candidate target list is not provided, triplets are further pruned if there exists a third gene, x, such that I[T F; x] ≥ I [T F; t] and I[t; x] ≥ I[T F; t] in both
, showing an indirect relationship between TF and t mediated by x, as suggested by the Data Processing Inequality59 (see SI Section 1.7 and Figure S8 for details).
Modulator minimum support
Once MINDy inference is made on individual (TF, M, t), modulators are selected based on its support, i.e. the number of TF targets it modulates. The minimum support is determined using a permutation test procedure where an identical MINDy run is performed on the same set of candidate modulators and candidate targets except that
and
are chosen at random. This produces a permutated set of MINDy inferences, based on which we can compute the modulator support under the null. The minimum support is then selected as the support that gives the smallest FDR (i.e. the ratio between the numbers of selected modulators in the permutated run vs. real run). For MINDy analysis based on DB-targets, fifteen was determined as the minimum support searching for modulators among all genes, and 7 if candidate modulators are restricted to only signaling proteins and TFs. For MINDy analysis based on AR-targets, the minimum support is determined to be 8 when candidate modulators are selected among signaling proteins and TFs,
Modulator Mode of Action and Biological Activity
For each significant triplet, we define M as a positive (negative) modulator of the T F→ t interaction if ΔI> 0 (ΔI< 0). This indicates only whether M increases or decreases the mutual information between TF and t, and does not necessarily correspond to the biological activity of the modulator (i.e. TF activator or repressor). The latter can be assessed for each tested triplet as:
where ρ is the Pearson correlation between the TF and the candidate target t, and
is the mean expression of t in
. We assess these differences using a two-sample Student t-test with 10% type-I error rate (two sided). For modulators affecting more than one MYC target, the biological mode is labeled as undetermined if the undetermined triplets are the majority (>50%) or if neither mode dominates the other by more than 30%. Otherwise it is assigned the dominant mode (See SI Section 1.4 and Table S11 for more description).
GO enrichment analysis
GO molecular function categories with fewer than 20 and more than 500 genes were excluded. The enrichment of each category was computed using the Fisher’s exact test And corrected for multiple testing using the method of Storey and Tibshirani60.
Promoter analysis
For TFs with an appropriate DNA binding profile in the TRANSFAC professional database (version 6.0)24, we determined the binding-site enrichment in the promoter regions (defined as 2Kb upstream and 2Kb downstream of the transcription initiation site, masked for repetitive elements) of the targets they modulate. Sequences were retrieved from the UCSC Golden Path database (build35, May 2004)61. The binding profile match threshold was calibrated for a FDR ≤ 5% per 1 Kb (in both directions). The binding-site enrichment, compared to a 13,000 random human promoter5 background, was computed by Fisher’s exact test.
Cell lines and cell culture
The human embryonic kidney 293T cells were maintained in DMEM supplemented with 10% FBS and antibiotics. The Burkitt lymphoma cell lines, Ramos, P3HR1, ST486, and ODH-III were maintained in IMDM supplemented with 10% FBS and antibiotics.
Plasmids
The mammalianexpression vectors encoding MYC and TERT-800Luc have been previously described28. The mammalianexpression vectors encoding BHLHB2/Stra13-FLAG, HDAC1-FLAG, and HA-MEF2B were kindly provided by Dr R. Taneja (Mount Sinai School of Medicine, New York, NY), Dr S.L. Schreiber (Harvard University, Cambridge, MA), and Dr R. Prywes (Columbia University, New York, NY), respectively. HA-STK38 (pcDNA3.1-NDR1-wt) expression vector was provided by Dr. B. Hemmings (FMI, Basel, Switzerland). MYC-HA and FLAG-MYC plasmids were constructed by subcloning the corresponding human cDNA amplified by PCR into the pcDNA3 (Invitrogen) and pCMV-Tag2A (STRATAGENE) vectors, respectively.
Transient transfection and reporter assays
293T cells were transiently transfected by using the calcium-phosphate precipitation method and luciferase reporter assays were performed as previously described62, 63. Each transfection was done in duplicate and luciferase activities were measured 48h post-transfection using dual-luciferase reporter assay kit (Promega) according to the manufacturer’s protocol.
Co-immunoprecipitation assay and Western blot analysis
Nuclear cell extracts and whole cell lysates were prepared as previously described64. Proteins were analyzed by SDS-PAGE and subsequently by Western blot using the following antibodies: anti-MYC (C33 and N262), anti-HDAC1 (N-19), and anti-NDR1 (STK38) (G15) (Santa Cruz Biotechnology); anti-BHLHB2/DEC1 (BL2928) (BETHYL); anti-MEF2B (ab33540) (Abcam); anti-STK38 (2G8-1F3) (Novus Biologicals); Flag M2 and anti-HA beads (Sigma); and hemagglutinin (Roche).
ChIP assays were performed as previously described65, 66. Antibodies used were anti-MYC (N-262, Santa Cruz), anti-BHLHB2/DEC1 (BL2928, Bethyl) and anti-HDAC1 (AB7028, Abcam). The immunoprecipitated chromatin fragments from two independent experiments were pooled together and the amounts of sample immunoprecipitated by individual antibody were assessed by quantitative real-time PCR. The oligonucleotide pairs are listed in Table S12.
In vitro de-acetylation assay
FLAG-MYC and CBP-HA expression vectors were transiently transfected into 293T cells. Acetylated-FLAG-MYC was purified with FLAG-beads. FLAG-HDAC1 was purified by FLAG-beads from 293T cells transiently transfected with FLAG-HDAC1expression vector. Acetylated-FLAG-MYC and FLAG-HDAC1 were incubated at 30°C for 2hrs in a buffer containing 50mM Tris-HCl, 50mM NaCl, 4mM MgCl2, 0.5mM DTT, 0.2mM PMSF, 0.02% NP-40 and 5% Glycerol with or without 2μM TSA for inhibition of HDAC1 activity.
shRNA and lentiviral infections
Lentiviral vectors for BHLHB2 shRNA (TRCN0000013249), MEF2B shRNA (TRCN0000015739), HDAC1 shRNA (TRCN0000004818), STK38 shRNA (TRCN0000010216) and Non-Target control shRNA (SHC002) were purchased from Sigma. Lentiviral supernatants were produced by transiently co-transfecting the lentiviral vectors, the packaging vector delta 8.9, and the VSV-G envelope glycoprotein vector as previously described67, 68. For infection, ODH-III, P3HR1 and ST486 cells (2×106 cells/ml) were mixed with viral supernatants, supplemented with 8μg/ml polybrene and centrifuged for 120 min at 450g. The ODH-III and ST486 cells were collected for analysis 96h and 60h post-infection, respectively. The lentiviral infected P3HR1 cells were selected with puromycin (0.5 μg/ml) for 5 days and collected for analysis.
qRT-PCR analysis
Polymerase chain reaction with reverse transcription (RT-PCR) analysis was performed using FastLane Cell cDNA Kit (Qiagen) according to the manufacturer’s instructions. qRT-PCR was performed with Quanti Tect SYBR Green PCR kit (Qiagen) using the 7300 Real Time PCR systems (Applied Biosystems) according to manufacturer’s instructions. The oligonucleotide primers are described in Table S12.
Gene-expression profiling after lentiviral mediated silencing of the modulator gene
For each modulator of interest, 5 samples infected with the modulator-specific shRNA and 5 with non-target control shRNA, were obtained. Total RNA was extracted with Trizol reagent (Invitrogen) and purified using the RNeasy kit (Qiagen). Biotinylated cRNA was produced according to the manufacturer instructions, starting with 6μg of total RNA and following the one-cycle cDNA synthesis protocol (Affymetrix, 701025 Rev.6). 15 ug of fragmented cRNA were hybridized to HG-U95Av2 microarrays (Affymetrix).
Expression profile data analysis
We determined the gene expression values by MAS5.0 normalization method provided in Affymetrix’s GeneChip Operating Software (GCOS). Differential expression between modulator shRNA and control shRNA samples was analyzed using two-sample t-test. For the GSEA analysis probe sets were sorted in decreasing order by the −log10 of their t-test p-values. Readers are referred to [32] for details on the GSEA algorithm. In the GSEA plot specific targets (MYC- or MINDy-signatures) are shown as vertical bars against the background of all B cell expressed genes in the expression profiles, sorted from the most to the least differentially expressed following silencing of the candidate modulator. The curve represents a random walk where the value on the y-axis is increased proportionally each time the gene on the x-axis is one of the selected targets and decreased if it is part of the background. Weights are chosen proportionally to the statistical significance of the differential expression and to the relative number of targets in the signatures vs. the background list, such that the curve starts and ends at y = 0. The statistical significance of the GSEA statistics (i.e., the maximum height of the curve, called Enrichment Score, ES) was determined by permutation test where the ranks of the probe sets were randomly shuffled 1000 times. To determine the enrichment of MINDy predicted modulator-specific targets of MYC among the differentially expressed genes, probe sets that rank before the GSEA leading edge (i.e. the increasing phase of GSEA profile) were determined to be significantly differentially expressed, and the enrichment was calculated using the Fisher’s exact test.
Authors: Jagruti H Patel; Yanping Du; Penny G Ard; Charles Phillips; Beth Carella; Chi-Ju Chen; Carrie Rakowski; Chandrima Chatterjee; Paul M Lieberman; William S Lane; Gerd A Blobel; Steven B McMahon Journal: Mol Cell Biol Date: 2004-12 Impact factor: 4.272
Authors: Haiyuan Yu; Pascal Braun; Muhammed A Yildirim; Irma Lemmens; Kavitha Venkatesan; Julie Sahalie; Tomoko Hirozane-Kishikawa; Fana Gebreab; Na Li; Nicolas Simonis; Tong Hao; Jean-François Rual; Amélie Dricot; Alexei Vazquez; Ryan R Murray; Christophe Simon; Leah Tardivo; Stanley Tam; Nenad Svrzikapa; Changyu Fan; Anne-Sophie de Smet; Adriana Motyl; Michael E Hudson; Juyong Park; Xiaofeng Xin; Michael E Cusick; Troy Moore; Charlie Boone; Michael Snyder; Frederick P Roth; Albert-László Barabási; Jan Tavernier; David E Hill; Marc Vidal Journal: Science Date: 2008-08-21 Impact factor: 47.728
Authors: Andrei A Ivanov; Brian Revennaugh; Lauren Rusnak; Valentina Gonzalez-Pecchi; Xiulei Mo; Margaret A Johns; Yuhong Du; Lee A D Cooper; Carlos S Moreno; Fadlo R Khuri; Haian Fu Journal: Bioinformatics Date: 2018-04-01 Impact factor: 6.937