| Literature DB >> 24516403 |
Konrad J Karczewski1, Michael Snyder2, Russ B Altman3, Nicholas P Tatonetti4.
Abstract
Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24516403 PMCID: PMC3916285 DOI: 10.1371/journal.pgen.1004122
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Independent Component Analysis (ICA) can be used to identify transcriptional modules from gene expression data.
(A): The classical example of ICA is the “cocktail party problem,” where a number of microphones are placed in a room, capturing a mixture of conversations. Source separation methods such as ICA attempt to deconvolve the recorded mixed signals into their separate source signals (individual conversations). (B): An analogous application involves identifying source signals of transcriptional regulators from complex gene expression measurements.
Figure 2Association of TFs to expression modules.
(A): A TF is associated to a module if its targets are significantly enriched in a particular module. TF are connected to their targets using ChIP-Seq data, which may (solid) or may not (dashed) be contained with an expression module. GO annotations (colored blue/yellow) are used in enrichment analysis to associate modules and their factors to functional pathways. (B): We evaluated the quality of TFICA derived TF targets based on the hypothesis that if a TF does regulate a target, then it is more likely that the TF and the target will share a functional annotation. Across ChIP-Seq scores, TFICA outperforms the naive method, and this performance is further increased when only considering high and medium-confidence modules (see text).
Functional modules recapitulate known transcription factor biology.
| Rank | Transcription Factor (name) | Functional Module ID | Number of genes in module bound by TF | Number of genes in module | Odds Ratio | Adjusted P Value |
| 1 | SREBP2 (Sterol regulatory element-binding protein) | 158 | 30 | 119 | 45.2 | 1E-31 |
| Module Enriched GO Terms (P<0.05) | ||||||
|
| ||||||
| Module Enriched KEGG Pathways (P<0.05) | ||||||
|
| ||||||
| 2 | GCN5 (Histone acetyltransferase) | 104 | 8 | 69 | 33 | 5.13E-08 |
| Module Enriched GO Terms (P<0.05) | ||||||
|
| ||||||
| Module Enriched KEGG Pathways (P<0.05) | ||||||
|
| ||||||
| 3 | GCN5 (Histone acetyltransferase) | 62 | 13 | 183 | 20.5 | 2E-10 |
| Module Enriched GO Terms (P<0.05) | ||||||
|
| ||||||
| Module Enriched KEGG Pathways (P<0.05) | ||||||
|
| ||||||
| 4 | NELFe (Negative elongation factor E) | 104 | 19 | 69 | 19.5 | 2.85E-14 |
| See annotations for #2 | ||||||
| 5 | ZNF274 (zinc finger protein 274) | 111 | 71 | 196 | 18.6 | 7.2E-50 |
| Module Enriched GO Terms (P<0.05) | ||||||
|
| ||||||
| Module Enriched Interpro Terms (P<0.05) | ||||||
|
| ||||||
| 71 | NFKB | 8 | 217 | 257 | 4.6 | 1.8E-21 |
| Module Enriched GO Terms (P<0.05) | ||||||
|
| ||||||
| Various | Module significantly associated with 121 Factors | 57 | Various | 159 | Various | Various |
| Module Enriched GO Terms (P<0.05) | ||||||
|
| ||||||
Top 5 TF-module associations, as well as NFκB and a general transcription module (associated with 141 different TFs) are shown.
Figure 3Predicting TF-TF interactions using shared modules as a measure of shared function.
(A): Prediction of (i) gene expression correlation, (ii) literature mentions, and (iii) shared functional annotations using a Naive approach, shared TFICA modules, and weighted TFICA modules. The Naive approach (“Naive”) links TFs to TFs by the similarity of their ChIP-Seq targets, “TFICA” links TFs to TFs by the similarity of their significantly associated modules, and weighted TFICA weights these modules in the similarity by their confidence. β coefficients in a linear model are shown with 95% confidence intervals. In each case, TFICA and weighted TFICA significantly outperforms the Naive approach. In addition, we used permutation testing to validate these results. In each case (expression, literature, function) the β coefficient for the permuted model was not significant (βexp = 0.16; 95%CI −0.02–0.34; βlit = −0.02 95%CI −0.08–0.05; βfun = −0.04 95%CI −0.14–0.06, P>0.05 for each). Data not drawn. (B): The top 30 highest-scoring pairs are shown, as measured by target module similarity, 14 of which are known associations (solid lines). Many of these factors form a tight sub-network of activators and repressors.
Figure 4Transcription factor interaction network reveals functional and disease sub-networks.
Transcription factors are connected solely on the basis of the similarity of the modules that they regulate. Transcription factors are colored according to a selection of diseases; (A, green): AIDS; (B, blue): arrhythmia; (C, pink): breast cancer; (D, red): hemorrhage. Nodes are annotated with strong (dashed black borders) and weak (solid grey borders) literature support. See Table 2 for details.
Transcription factor-disease relationships derived from shared modules.
| Disease | Transcription Factor (alt names) |
| TF-Module Odds Ratio | TF-Module P-Value | Disease genes in module/Total disease genes | Relevant Literature Annotations (Pubmed ID) | Literature Support of Association |
| Arrhythmogenic Right Ventricular Dysplasia | ER-α (ESR1) |
| 2.71 (2.09, 3.51) | 3.1E-12 | 5/8 | Atrial Fibrillation (19860128, 19860128) | strong |
| Coronary Heart Disease (20153472) | |||||||
| Sudden cardiac death (21658281) | |||||||
| GR (NR3C1) |
| 2.39 (1.84, 3.11) | 1.03E-11 | 5/8 | Coronary Heart Disease (19783104) | weak | |
| P300 |
| 1.75 (1.31, 2.37) | 5.9E-05 | 5/8 | Myocardial Infarction (21737953) | weak | |
| GATA3 |
| 2.02 (1.56, 2.63) | 3.89E-08 | 5/8 | Heart Development (18955134) | weak | |
| c-Jun (JNK) |
| 1.74 (1.32, 2.31) | 2.96E-05 | 5/8 | Myocardial Infarction via K+ channel regulation (20518594) | strong | |
| Myocardial Infarction (21324895) | |||||||
| CEBPB |
| 1.92 (1.34, 2.83) | 0.000141 | 5/8 | Chronic Heart Failure (12601168) | weak | |
| STAT3 |
| 2.41 (1.79, 3.27) | 2.96E-10 | 5/8 | Ventricular Arrhythmias and Contractile Dysfunction (22082679) | strong | |
| Atrial Fibrillation (18774104) | |||||||
| JunD |
| 1.74 (1.29, 2.37) | 0.00011 | 5/8 | Cardiac hypertrophy and Heart failure (15655111) | weak | |
| Chronic Heart Disease (9136081) | |||||||
| STAT1 |
| 1.68 (1.29, 2.19) | 9.01E-05 | 5/8 | Atrial Fibrilation (18774104) | strong | |
| Long QT Syndrome (17490620) | |||||||
| Breast Cancer | E2F6 |
| 2.29 (1.76, 2.98) | 1.21E-10 | 62/608 | Regulation of Tumor Suppressor (ARHI) in BC | weak |
|
| 1.67 (1.31, 2.13) | 1.54E-05 | 26/608 | ||||
|
| 1.75 (1.41, 2.18) | 1.91E-07 | 28/608 | ||||
| CHD2 |
| 2.23 (1.72, 2.88) | 6.7E-10 | 62/608 | Mammary tumor modifier (17557176) | strong | |
|
| 1.82 (1.43, 2.32) | 8.28E-07 | 26/608 | Gastric and Colorectal Cancer (21447119) | |||
| Colon Cancer Biomarker (17390049) | |||||||
| NFYA |
| 2.78 (2.14, 3.60) | 9.24E-15 | 62/608 | E2F-1 Regulation (12697671) | weak | |
| Estrogen Regulation (15224348) | |||||||
| IRF1 |
| 2.34 (1.8, 3.05) | 3.57E-11 | 62/608 | Resistance to endocrine therapy in Breast Cancer Treatment (22295238) | strong | |
|
| 2.91 (2.33, 3.65) | 8.9E-23 | 28/608 | Therapy resistant breast tumors (20457620) | |||
| Commonly mutated/rearranged in breast cancers (19697121, 17498560) | |||||||
| Differentially expression in breast tissue >(16241857) | |||||||
| HEY1 |
| 2.17 (1.63, 2.92) | 1.7E-08 | 62/608 | Target of Notch signaling (18469855) | weak | |
|
| 2.5 (1.9, 3.33) | 1.39E-12 | 26/608 | ||||
|
| 2.2 (1.73, 2.81) | 8.77E-12 | 28/608 | ||||
| E2F1 |
| 3.31 (2.53, 4.31) | 8.36E-18 | 62/608 | Tumor cell growth (22205655) | strong | |
|
| 2.58 (1.99, 3.33) | 1.27E-12 | 26/608 | E2F1-dependent drug efficacy in breast cancer treatment (22185819, 20215421) | |||
|
| 3.78 (2.62, 5.43) | 1.61E-12 | 26/608 | Breast cancer treatment (21573702, 21479363) | |||
| Prognostic breast cancer marker (21453498, 20410059) | |||||||
| Acquired Immunodeficiency Syndrome | BATF |
| 3.11 (2.24, 4.35) | 4.23E-13 | 8/40 | Inhibits T cell function in HIV (20890291) | strong |
| NFKB |
| 2.53 (1.77, 3.67) | 3.04E-08 | 8/40 | HIV use of NFKB pathway (11160127) | strong | |
| NFKB binds HIV TAR-RNA (22352910) | |||||||
| BCL11A |
| 2.72 (1.99, 3.72) | 2.02E-10 | 8/40 | Represses HIV-1 gene transcription (15849318) | weak | |
| MEF2C |
| 2.19 (1.55, 3.07) | 8.2E-06 | 8/40 | Misregulated in HIV-associated dementia (21170291) | weak | |
| IRF4 |
| 2.63 (1.92, 3.61) | 4.01E-10 | 8/40 | Regulates anti-HIV gene (21078663) | strong | |
| Expression associated in HIV-related lymphomas (11157493) | |||||||
| Thrombocytopenia | p300 (CREBBP) |
| 2.29 (1.5, 3.38) | 8.9E-05 | 5/36 | Polymorphisms in p300 associated with thrombocytopenia (18684867) | strong |
| Fos |
| 2.02 (1.47, 2.75) | 1.03E-05 | 5/36 | PDGF (c-Fos regulator) knockouts induce thrombocytopenia (12670444) | weak | |
| GATA1 |
| 1.87 (1.38, 2.53) | 2.51E-05 | 5/36 | GATA1 knockout mice develop thrombocytopenia (10216081) | strong |
Figure 5Regulatory network of human disease.
Transcription factors (blue) are connected to diseases (red) through modules in this bipartite graph. Prominent clusters of diseases are highlighted, as well as some highly-connected transcription factors. Importantly, STAT3 is connected to many fibrotic diseases, while E2F1 and E2F4 are connected to breast and ovarian cancer. (A): Expression of MEF2A and the projection of module 262 are significantly predictive of disease state. Individuals are ranked by their combined score (sum of normalized expression and module projection). (B): ROC curve for prediction of Crohn's disease from MEF2A expression, module 262 projection, and combined metric.