| Literature DB >> 29325558 |
Yong Fuga Li1,2,3, Russ B Altman4,5.
Abstract
BACKGROUND: Transcription factors (TFs), the key players in transcriptional regulation, have attracted great experimental attention, yet the functions of most human TFs remain poorly understood. Recent capabilities in genome-wide protein binding profiling have stimulated systematic studies of the hierarchical organization of human gene regulatory network and DNA-binding specificity of TFs, shedding light on combinatorial gene regulation. We show here that these data also enable a systematic annotation of the biological functions and functional diversity of TFs. RESULT: We compiled a human gene regulatory network for 384 TFs covering the 146,096 TF-target gene (TF-TG) relationships, extracted from over 850 ChIP-seq experiments as well as the literature. By integrating this network of TF-TF and TF-TG relationships with 3715 functional concepts from six sources of gene function annotations, we obtained over 9000 confident functional annotations for 279 TFs. We observe extensive connectivity between TFs and Mendelian diseases, GWAS phenotypes, and pharmacogenetic pathways. Further, we show that TFs link apparently unrelated functions, even when the two functions do not share common genes. Finally, we analyze the pleiotropic functions of TFs and suggest that the increased number of upstream regulators contributes to the functional pleiotropy of TFs.Entities:
Keywords: Co-regulation; Database; Function enrichment; Functional pleiotropy; Gene function annotation; Regulator diversity; Regulatory network; Target gene; Transcription factor
Mesh:
Substances:
Year: 2018 PMID: 29325558 PMCID: PMC5795274 DOI: 10.1186/s12915-017-0469-0
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Fig. 1An outline of the workflow for regulatory network based annotation of transcription factor functions
Fig. 2Presence of gene function signals in the TFTG data. The scatter plot shows the P values of function–TF associations obtained using real TFTG compendium (y-axis) and a fake TFTG compendium (x-axis). Each dot corresponds to a pair of P values for a TF–function pair. The inlet shows the number of significant TF–target function relationships at varying P value cutoffs for the real TFTG data (y-axis) against the number for the fake TFTG data (x-axis). P values were obtained by G-tests. Log base 10 was used
Fig. 3a Global view of the transcription factors (TFs) and their target functions; 311 TFs and 1420 annotations with one or more significant associations at FDR 0.1 levels were retained. Red indicates positive associations, green indicates negative associations, white indicates FDR > 0.1. Intensity of the colors corresponds to the significance levels: FDR 0.1, 0.05, and 0.01. The TF and target function clustering showed on the left and top was performed based on the TF-target function association phi coefficient matrix. We used the literature rich gene universe for the association analysis except for the TF-GWAS phenotype association, for which the coding gene universe was used. b The network visualization [148] of TF–target function and TF–known function relationships. Edges are colored red or green the same way as in (a). A solid edge links a TF with a significant target function that is not a known function. A dashed edge links a TF with a known function. A dashed edge with color links a TF with a known function that is also a significant target function, while a grey dashed edge links a TF with a known function that is not a significant target function. Node colors and shapes correspond to function types – purple circles, TFs; grey rectangles, Reactome pathways; blue triangles, GO molecular functions; white diamonds, GO biological processes; red rhomboids, PharmGKB PK and PD pathways; yellow hexagon, Mendelian diseases; green octagons, GWAS phenotypes. c–g Local regions of TF-function networks selected from b
Top 20 TF-target disease associations. The “Literature Rich” gene universe is used for the association detection
| TF | Target disease | log2(OR)a | Evidencec | |
|---|---|---|---|---|
| ATF3 | Lysosomal storage disease | 4.5 | 3.9 × 10-09 | – |
| BRCA1 | Mitochondrial metabolism disease | 3.1 | 3.7 × 10-09 | – |
| CTBP2 | Heart septal defect | 6.9 | 4.4 × 10-10 | Mousef [ |
| Congenital heart disease | 6.5 | 1.6 × 10-09 | Mousef [ | |
| ETS1 | Organ system cancer | 2.6 | 1.2 × 10-08 | Mutationd [ |
| GATA1 | Acute porphyria | 7.4 | 9.8 × 10-09 | Mutationd [ |
| HNF4A | Mitochondrial metabolism disease | 3.0 | 5.3 × 10-09 | Mutation in MODY1d [ |
| NFE2 | Lysosomal storage disease | 5.7 | 4.8 × 10-12 | – |
| Lipid storage disease | 6.0 | 5.8 × 10-09 | – | |
| RFX2 | Bardet-Biedl syndrome | 5.5 | 1.1 × 10-12 | Mousef [ |
| SOX10 | Waardenburg’s syndrome | 11.4 | 2.0 × 10-09 | Mutationd [ |
| SUZ12 | Heart septal defect | 6.2 | 7.4 × 10-09 | Mousef [ |
| TP53 | Organ system cancer | 3.5 | 1.5 × 10-19 | Mutations in multiple cancerd |
| Cancer | 3.5 | 3.6 × 10-19 | Types [ | |
| Disease of cellular proliferation | 3.4 | 1.7 × 10-18 | ||
| Reproductive organ cancer | 4.8 | 1.2 × 10-08 | ||
| USF1 | Disease of metabolism | 2.2 | 1.3 × 10-10 | Association with FCHLe [ |
| Inherited metabolic disorder | 2.2 | 7.5 × 10-09 | ||
| USF2 | Lysosomal storage disease | 4.1 | 1.3 × 10-09 | – |
| Disease of metabolism | 2.3 | 2.8 × 10-09 | – |
alog2(OR), log2 transformed odds ratio
bP value from single-tailed Fisher’s exact test for odds ratio > 1
cEvidence lists published genetic evidence directly support the association of the TF with the disease
dMutation mutations of the TF are observed in the disease or closely related diseases
eAssociation, the TF gene locus is genetically associated with the disease or related diseases
fMouse mouse model shows phenotypes directly related to the disease.
Non-genetic evidence in the literature is not considered.
MODY1 maturity-onset diabetes of the young, FCHL familial combined hyperlipidemia
Discordance transcription factors’ target function similarity and target gene similarity
| TF pair classification | Counts | Significant sharing of known functionsc | |||
|---|---|---|---|---|---|
| Description | Target-function sharing | Target-gene sharing | |||
| Unexpected target function similarity | Significanta | Low | 329 | 117 (35.6%) | 0.0010 OR = 1.45 |
| Other pairs with low target gene sharing | Not | Low | 42,373 | 11,704 (27.6%) | |
| Expected target function similarity | Significanta | Highb | 4583 | 1772 (38.7%) | 7.7 × 10-13 OR = 1.27 |
| Other pairs with high target gene sharing | Not | Highb | 26,251 | 8727 (33.2%) | |
aSignificant target-function sharing: target function overlap significantly higher than expected by change (FDR ≤ 0.05)
bHigh sharing of target gene: odds ratio of target gene sharing between a pair of TF is ≥ 1
cSignificant sharing of known functions: known function overlap significantly higher than expected by change (FDR ≤ 0.01, see Additional file 1: Table S9 for results at threshold 0.05)
OR odds ratio
Fig. 4Transcription factor sharing among apparently unrelated functional concepts. a Two functional concepts with high member gene overlaps always have similar regulators, but (b) two functional concepts with nearly identical regulators do not always have high member gene sharing. c A Venn diagram for three functional concepts for which shared transcription factors are identified for functions without gene overlaps. The arrows connect the significant regulators for the functions. Note that Iron Uptake and Transport and Insulin Receptor Recycling do share member genes significantly, but neither of them shares member genes with Lipid Storage Disease
Functional pleiotropy and regulator diversity of selected transcription factors (TFs), including two most functional pleiotropic TFs, BRCA1 and ZNF143, two TFs with the highest upstream regulatory diversity, MYC and TP53, and three TFs with lower functional pleiotropy, HNF1A, NFKB1, and SUZ12
| TF | Function pleiotropy | Regulator diversity | ||||
|---|---|---|---|---|---|---|
| Target function | Effective target function | Known function | Effective known function | Upstream regulator | Effective upstream regulator | |
| BRCA1 | 272 | 45.5 | 101 | 14.5 | 33 | 15.9 |
| ZNF143 | 242 | 35.4 | 22 | 2.8 | 53 | 22.1 |
| MYC | 159 | 24.2 | 68 | 12.4 | 50 | 25.1 |
| TP53 | 175 | 26.8 | 166 | 25.4 | 49 | 23.0 |
| HNF1A | 30 | 6.3 | 58 | 11.2 | 9 | 3.8 |
| NFKB1 | 143 | 23.7 | 34 | 6.2 | 26 | 11.7 |
| SUZ12 | 48 | 8.3 | 4 | 1.0 | 11 | 4.9 |
Fig. 5The relationship between functional diversity and regulator diversity of transcription factors (TFs). a The target functions of TF HNF1A form three major clusters based on similarities (member gene sharing) among the functions, while the upstream regulators of HNF1A form clusters based on the functional similarities (target gene overlaps) among these regulators. The regulator and functional diversities of a gene measures the effective number of regulators and effective number of functions for a gene. The coloring schema is the same as in Fig. 3 and the clustering of TFs and functions are based on the TF’s target gene overlaps and the function’s member gene overlaps. b Significant associations exist between the regulator diversity and target function diversity of TFs for six types of function annotations separately
Complete target gene-based annotations for two example transcription factors (TFs) (A) NFKB1 and (B) SUZ12. Three types of information are provided (1) the top TF neighbors obtained by TF distance (target-gene overlap measured by Pearson’s phi coefficient) < 0.8, (2) the target functions in six categories, and (3) the functional diversities in six categories and total diversity. See Additional file 1: Figure S9 for a visualization of the regulator and target function networks surrounding NFKB1
| A | ||
|
| ||
|
| ||
| GWAS phenotype | Arthritis, rheumatoid (8.4) Psoriasis (6.95) Colitis, ulcerative (6.29) | 2.0 |
| Mendelian disease | Disease of cellular proliferation (7) Organ system cancer (6.84) Cancer (6.64) + 10 more | 2.5 |
| PD/PK pathway | EGFR inhibitor pathway (PD) (6.13) Doxorubicin pathway (cancer cell) (PD) (5.47) | 1.2 |
| Signaling/metabolic pathway | Immune system (17.11) Cytokine signaling in immune system (14.62) Interferon gamma signaling (10.89) + 46 more | 13.2 |
| Biological process (GO) | Adaptive immune response (5.86) Regulation of protein metabolic process (5.81) Regulation of cytokine production (5.72) + 71 more | 10.2 |
| Molecular function (GO) | Cytokine activity (14.33) Receptor binding (8.3) Chemokine activity (7.22) + 4 more | 3.2 |
|
| 23.7 | |
| B | ||
|
| ||
|
| ||
| GWAS phenotype | – | 0 |
| Mendelian disease | Heart septal defect (8.13) Congenital heart disease (7.57) Disease (7.14) + 6 more | 1.8 |
| PD/PK pathway | – | 0 |
| Signaling/metabolic pathway | Regulation of beta cell development (12.19) Regulation of gene expression in beta cells (6.26) Class b 2 secretin family receptors (4.08) | 1.2 |
| Biological process (GO) | Anatomical structure development (40.03) Multicellular organismal development (33.22) System development (32.56) + 29 more | 5.8 |
| Molecular function (GO) | Transcription factor activity (36.68) DNA binding (33.55) RNA polymerase II transcription factor activity (11.83) + 1 more | 2.2 |
|
| 8.3 | |