| Literature DB >> 18053230 |
Zihua Hu1, Boyu Hu, James F Collins.
Abstract
BACKGROUND: Previous methods employed for the identification of synergistic transcription factors (TFs) are based on either TF enrichment from co-regulated genes or phylogenetic footprinting. Despite the success of these methods, both have limitations.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18053230 PMCID: PMC2246259 DOI: 10.1186/gb-2007-8-12-r257
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Flowchart of analysis procedures.
Figure 2Distribution of LODand LODfrom different distance constraints. (a) LODdistribution of 234 TFBSs from 9 selected distance constrains (for example, D20 stands for between-TFBS distance of 20 bp) and the one without a distance constraint (None) for human (hs). (b) LODdistribution of 234 TFBSs from 9 selected distance constraints and the one without a distance constraint for mouse (mm). (c) LODdistribution of 234 TFBSs from 9 selected distance constraints and the one without a distance constraint. (d) Median LODscores for both human (hs_LODco) and mouse (mm_LODco) and median LODscores.
Figure 3Distribution and frequency of LODand LODcorrelations from 19 distance constraints for individual TFBSs. (a) Distribution and frequency of correlation for all 243 TFBSs (grey) and for the 51 selected TFBSs (blue) from human (hs). (b) Distribution and frequency of correlation for all 243 TFBSs (grey) and for the 51 selected TFBSs (blue) from mouse (mm).
Correlations (R) and p values (P) from both human (hs) and mouse (mm) for the 51 homotypic TF combinations
| TFs | ||||
| FAC1 | 0.98 | <0.0001 | 0.98 | <0.0001 |
| MAZ | 0.98 | <0.0001 | 0.98 | <0.0001 |
| GC | 0.97 | <0.0001 | 0.99 | <0.0001 |
| ZF5 | 0.97 | <0.0001 | 0.97 | <0.0001 |
| EGR | 0.97 | <0.0001 | 0.99 | <0.0001 |
| TBP | 0.95 | <0.0001 | 0.95 | <0.0001 |
| SP1 | 0.93 | <0.0001 | 0.95 | <0.0001 |
| NFAT | 0.93 | <0.0001 | 0.91 | <0.0001 |
| ETF | 0.92 | <0.0001 | 0.92 | <0.0001 |
| KROX | 0.90 | <0.0001 | 0.90 | <0.0001 |
| XVENT1 | 0.90 | <0.0001 | 0.86 | <0.0001 |
| ZIC3 | 0.90 | <0.0001 | 0.91 | <0.0001 |
| CETS168 | 0.88 | <0.0001 | 0.90 | <0.0001 |
| MZF1 | 0.88 | <0.0001 | 0.89 | <0.0001 |
| PAX4 | 0.88 | <0.0001 | 0.88 | <0.0001 |
| LDSPOLYA | 0.87 | <0.0001 | 0.84 | <0.0001 |
| FREAC7 | 0.87 | <0.0001 | 0.84 | <0.0001 |
| OCT1 | 0.86 | <0.0001 | 0.85 | <0.0001 |
| MMEF2 | 0.83 | <0.0001 | 0.57 | 0.0007 |
| CACBINDING PROTEIN | 0.82 | <0.0001 | 0.85 | <0.0001 |
| DEAF1 | 0.82 | <0.0001 | 0.73 | <0.0001 |
| MINI19 | 0.78 | <0.0001 | 0.56 | 0.0032 |
| E12 | 0.78 | 0.0001 | 0.83 | 0.0001 |
| CEBPB | 0.77 | <0.0001 | 0.80 | 0.0001 |
| PU1 | 0.77 | <0.0001 | 0.82 | <0.0001 |
| FOX | 0.76 | 0.0001 | 0.72 | <0.0001 |
| IRF7 | 0.75 | <0.0001 | 0.78 | <0.0001 |
| HNF1 | 0.75 | 0.0014 | 0.86 | <0.0001 |
| CETS1P54 | 0.74 | <0.0001 | 0.74 | <0.0001 |
| LBP1 | 0.73 | <0.0001 | 0.77 | <0.0001 |
| HNF3B | 0.73 | 0.0006 | 0.67 | 0.0005 |
| OSF2 | 0.72 | 0.0019 | 0.69 | 0.0005 |
| CP2 | 0.71 | 0.0001 | 0.82 | <0.0001 |
| LEF1TCF1 | 0.70 | 0.0003 | 0.72 | <0.0001 |
| NRF2 | 0.68 | 0.0011 | 0.70 | 0.0008 |
| TFIII | 0.68 | 0.0005 | 0.65 | 0.0007 |
| DBP | 0.67 | <0.0001 | 0.77 | <0.0001 |
| GATA1 | 0.66 | 0.0002 | 0.81 | <0.0001 |
| PIT1 | 0.66 | <0.0001 | 0.67 | <0.0001 |
| HELIOSA | 0.66 | 0.0026 | 0.65 | 0.0022 |
| MYCMAX | 0.66 | 0.0004 | 0.77 | 0.0001 |
| LFA1 | 0.66 | <0.0001 | 0.81 | <0.0001 |
| SRY | 0.654 | 0.0012 | 0.69 | 0.0007 |
| CREB | 0.64 | 0.0003 | 0.55 | 0.0020 |
| AP3 | 0.63 | 0.0007 | 0.62 | 0.0012 |
| DELTAEF1 | 0.61 | 0.0016 | 0.52 | 0.0019 |
| CAAT | 0.57 | 0.0004 | 0.52 | 0.0030 |
| S8 | 0.57 | 0.0004 | 0.64 | <0.0001 |
| E2F1 | 0.60 | 0.0001 | 0.67 | <0.0001 |
| NMYC | 0.54 | 0.0005 | 0.58 | 0.0002 |
| SRF | 0.45 | 0.0001 | 0.35 | 0.0006 |
Correlations between 19 LODand their corresponding LODscores for each of 51 homotypic TF combinations are listed. Also listed are the statistical significances of the correlations computed from permutation tests using randomly paired LODwith LODscores.
Figure 4Distribution of LOD scores for selected TFBSs from all distance constraints. (a) LODscores of both human (hs_LODco) and mouse (mm_LODco) and LODscores for E2F1. Also shown are the correlations of LODand LODfor human (R) and mouse (R) and corresponding p values. (b) LODscores of both human (hs_LODco) and mouse (mm_LODco) and LODscores for MYOGENIN. Also shown are the correlations of LODand LODfor human (R) and mouse (R) and corresponding p values.
Enriched GO biological process categories for self-synergistic E2F1 and NFAT from between-TFBS distance 20 bp to 90 bp
| E2F1 | NFAT | |||
| Distance | No. of genes | Function categories | No. of genes | Function categories |
| D20 | 16 | Cell cycle (0.07/0.09) | 72 | Homophilic cell adhesion (0.03/0.01) |
| D30 | 31 | Sterol metabolism (0.004/0.004) | 119 | Homophilic cell adhesion (0.02/0.001) |
| Immune response (0.06/0.003) | ||||
| Response to biotic stimulus (0.06/0.01) | ||||
| Regulation of T cell activation (0.04/0.02) | ||||
| Regulation of lymphocyte activation (0.07/0.002) | ||||
| D40 | 49 | Cell cycle (0.02/0.007) | 166 | Homophilic cell adhesion (0.04/0.006) |
| Sterol metabolism (0.04/0.01) | Immune response (0.04/0.008) | |||
| Nucleotide and nucleic acid metabolism (0.04/0.07) | Response to biotic stimulus (0.06/0.03) | |||
| Regulation of T cell activation (0.07/0.03) | ||||
| D50 | 64 | Sterol metabolism (0.01/0.02) | 205 | Homophilic cell adhesion (0.01/0.0006) |
| Cell cycle (0.005/0.02) | Immune response (0.08/0.03) | |||
| Nucleotide and nucleic acid metabolism (0.01/0.02) | ||||
| D60 | 72 | Cell cycle (0.002/0.009) | 255 | Immune response (0.03/0.002) |
| Sterol metabolism (0.001/0.01) | Homophilic cell adhesion (0.03/0.002) | |||
| Nucleotide and nucleic acid metabolism (0.008/0.02) | Response to biotic stimulus (0.09/0.01) | |||
| Regulation of lymphocyte activation (0.06/0.02) | ||||
| Regulation of T cell activation (0.03/0.07) | ||||
| Cell-substrate adhesion (0.005/0.01) | ||||
| D70 | 83 | Cellular physiological process (0.002/0.02) | 300 | Homophilic cell adhesion (0.002/0.0005) |
| Cell cycle (0.005/0.02) | Immune response (0.01/0.01) | |||
| Nucleotide and nucleic acid metabolism (0.02/0.04) | Response to biotic stimulus (0.05/0.08) | |||
| Sterol metabolism (0.002/0.003) | Regulation of lymphocyte activation (0.05/0.02) | |||
| Cell-substrate adhesion (0.008/0.02 | ||||
| Regulation of T cell activation (0.04/0.009) | ||||
| D80 | 99 | Nucleotide and nucleic acid metabolism (0.006/0.008) | 341 | Homophilic cell adhesion (0.0009/0.00001) |
| Cell cycle (0.001/0.03) | Immune response (0.02/0.0004) | |||
| Sterol metabolism (0.003/0.004) | Response to biotic stimulus (0.07/0.004) | |||
| Cellular physiological process (0.001/0.01) | Regulation of lymphocyte activation (0.03/0.03) | |||
| Cell-substrate adhesion (0.01/0.02) | ||||
| D90 | 107 | Nucleotide and nucleic acid metabolism (0.003/0.003) | 392 | Homophilic cell adhesion (0.0001/0.000001) |
| Cell cycle (0.002/0.06) | Immune response (0.04/0.0008) | |||
| Sterol metabolism (0.004/0.006) | Regulation of lymphocyte activation (0.05/0.04) | |||
| Cellular physiological process (0.001/0.006) | Response to biotic stimulus (0.06/0.009) | |||
| Cell-substrate adhesion (0.02/0.04) | ||||
The number of overlapping orthologous human and mouse genes whose promoters have at least two TF binding sites within certain distance constraints (for example, D20 for a between-TFBS distance of 20 bp) is listed under "No. of genes". The statistical significances of commonly enriched biological process categories from both human and mouse genes are listed in parentheses (p value mouse/p value human).
Figure 5LODand LODcorrelation of 51 selected TFBSs from each distance constraint. (a) Correlation of LODand LODfor all individual distance constraints for both human (R) and mouse (R). (b) The distribution of correlation coefficients from 100,000 permuted pairs of LODwith LODscores from the between-TFBS distance of 30 bp from human. The relative locations for correlation coefficients from D30 and D90 are also shown.
Function annotation for 51 homotypic TF combinations
| TFs | GO biological process categories |
| AP3 | Cell adhesion; cellular localization; cellular process; extracellular matrix organization and biogenesis; innate immune response; intracellular transport; second-messenger-mediated signaling |
| CAAT* | Cell cycle; cell division [36]; cell organization and biogenesis; chromosome organization and biogenesis; DNA-dependent DNA replication; nucleobase, nucleoside, nucleotide and nucleic acid metabolism; protein localization; steroid biosynthesis |
| CACBINDING PROTEIN | Calcium ion transport; cellular process; intracellular signaling cascade; morphogenesis; nervous system development; organ development; regulation of signal transduction |
| CEBPB | Cellular carbohydrate metabolism |
| CETSP154 | Cell organization and biogenesis; cellular localization; cellular process; organelle organization and biogenesis; protein localization; ribosome biogenesis; ubiquitin cycle; vesicle-mediated transport |
| CETS168 | Cellular physiological process |
| CP2 | N/A |
| CREB | N/A |
| DBP | Apoptosis; cell adhesion; endocytosis; innate immune response; intracellular signaling cascade; lipid metabolism; phosphate transport; protein kinase cascade; response to endogenous stimulus; RNA processing |
| DEAF1 | N/A |
| DELTAEF1 | Cell adhesion |
| E12 | N/A |
| E2F1* | Cell cycle [25-28]; cholesterol metabolism; nucleobase, nucleoside, nucleotide and nucleic acid metabolism; sterol metabolism |
| EGR* | Apoptosis; brain development; cell cycle; cell proliferation; central nervous system development; development [49]; endocytosis; enzyme linked receptor protein signaling pathway; galactose metabolism; intracellular signaling cascade; metal ion transport; nervous system development [49]; protein amino acid phosphorylation; protein kinase cascade; small GTPase mediated signal transduction; synaptic transmission; transcription [37]; ubiquitin cycle |
| ETF* | Cell cycle; cell proliferation; cellular lipid metabolism; central nervous system development; dephosphorylation; endocytosis; enzyme linked receptor protein signaling pathway; gluconeogenesis; heart development; hexose biosynthesis; intracellular signaling cascade; muscle development; neurite morphogenesis [38]; programmed cell death; protein kinase cascade; regulation of nucleocytoplasmic transport; response to DNA damage stimulus; small GTPase mediated signal transduction |
| FAC1 | Cell adhesion; cell cycle; cellular lipid metabolism; endocytosis; I-kappaB kinase/NF-kappaB cascade; intracellular signaling cascade, via spliceosome; proteolysis; secretion. carbohydrate metabolism; DNA repair; nuclear import; protein kinase cascade; protein localization; response to endogenous stimulus; RNA splicing; ubiquitin cycle; vesicle-mediated transport; cytoplasm organization and biogenesis; innate immune response; endoplasmic reticulum to Golgi vesicle-mediated transport; microtubule cytoskeleton organization and biogenesis; wound healing; protein amino acid glycosylation |
| FOX | N/A |
| FREAC7 | N/A |
| GATA1 | N/A |
| GC* | Apoptosis [35]; cell proliferation [35]; actin cytoskeleton organization and biogenesis; cell cycle; cellular lipid metabolism; central nervous system development; endocytosis; enzyme linked receptor protein signaling pathway; endoplasmic reticulum to Golgi vesicle-mediated transport; gluconeogenesis; hexose biosynthesis; muscle development; nervous system development; Notch signaling pathway; nucleocytoplasmic transport; protein kinase cascade; small GTPase mediated signal transduction; synaptic transmission; transmembrane receptor protein tyrosine kinase signaling pathway; ubiquitin cycle; vesicle-mediated transport |
| HELIOSA | Cellular physiological process; development; homophilic cell adhesion; regulation of metabolism |
| HNF1* | Organic anion transport [39]; innate immune response |
| HNF3B | Lipid metabolism; DNA metabolism |
| IRF7* | Immune response [40] |
| KROX* | Actin cytoskeleton organization and biogenesis; cell cycle; enzyme linked receptor protein signaling pathway; intracellular signaling cascade; nervous system development [50]; phosphate metabolism; regulation of neurotransmitter levels; small GTPase mediated signal transduction; system development; ubiquitin cycle |
| LBP1* | Apoptosis [51]; cellular process; intracellular signaling cascade; protein amino acid phosphorylation; protein kinase cascade |
| LDSPOLYA | Development; aromatic amino acid family metabolism; intracellular signaling cascade |
| LEF1TDF1 | N/A |
| LFA1 | N/A |
| MAZ* | Apoptosis; brain development; cell adhesion; cell cycle; cell differentiation; endocytosis; enzyme linked receptor protein signaling pathway; intracellular signaling cascade; muscle development [41]; nervous system development; development [41]; protein amino acid phosphorylation; protein kinase cascade; regulation of actin filament length; small GTPase mediated signal transduction; Wnt receptor signaling |
| MINI19 | N/A |
| MMEF2 | N/A |
| MYCMAX | Cellular metabolism; macromolecule metabolism |
| MZF1* | Cell proliferation [52]; cell adhesion; cell cycle; cell differentiation; cell-cell signaling; enzyme linked receptor protein signaling pathway; hemopoiesis [52]; metal ion transport; nervous system development; neurotransmitter secretion; organ development; regulated secretory pathway; regulation of transcription, DNA-dependent; skeletal development; synaptic transmission; Wnt receptor signaling pathway |
| NFAT* | Immune response [29-31]; homophilic cell adhesion; organ development |
| NMYC | Cellular physiological process |
| NRF2* | Organelle organization and biogenesis [42]; cellular physiological process; protein transport |
| OCT1 | Apoptosis; cell-cell adhesion; cellular physiological process; protein transport |
| OSF2 | N/A |
| PAX4 | Cell proliferation; enzyme linked receptor protein signaling pathway; gamma-aminobutyric acid signaling pathway; inflammatory response; programmed cell death; regulation of kinase activity |
| PIT1 | Proteolysis |
| PU1 | Cell adhesion; regulation of kinase activity; regulation of transferase activity |
| S8* | Development [46] |
| SP1* | Cell differentiation [53]; cell proliferation [53]; apoptosis [35]; cell adhesion; cell cycle; cell-cell signaling; cellular lipid metabolism; central nervous system development; endocytosis; nervous system development; neurogenesis; nucleocytoplasmic transport; organelle organization and biogenesis; phosphate metabolism; protein kinase cascade; response to endogenous stimulus; Rho protein signal transduction; small GTPase mediated signal transduction; synaptic transmission; transcription from RNA polymerase II promoter; transmembrane receptor protein tyrosine kinase signaling pathway; ubiquitin cycle; vesicle-mediated transport |
| SRF | N/A |
| SRY | Cell adhesion; cellular process; intracellular signaling cascade; mRNA processing; organic acid metabolism; response to DNA damage stimulus; RNA metabolism; RNA splicing; steroid metabolism |
| TBP | Protein transport; establishment of protein localization; RNA processing |
| TFIII* | Cell adhesion; cell differentiation; cell organization and biogenesis; chromatin modification; enzyme linked receptor protein signaling pathway; intracellular signaling cascade; nervous system development; organ development; protein kinase cascade; protein modification; transcription, DNA-dependent [54] |
| XVENT1 | Cell cycle; cell growth; cell proliferation; cellular biosynthesis; establishment of cellular localization; inflammatory response; innate immune response; intracellular signaling cascade; lipid metabolism; mitochondrion organization and biogenesis; protein complex assembly; protein kinase cascade; response to endogenous stimulus; response to oxidative stress; RNA processing; RNA splicing; secretion; transcription from RNA polymerase II promoter |
| ZF5* | Actin polymerization and/or depolymerization; cell cycle; cell proliferation; cellular lipid metabolism; endocytosis; enzyme linked receptor protein signaling pathway; endoplasmic reticulum to Golgi vesicle-mediated transport; glycoprotein biosynthesis; hexose metabolism; induction of programmed cell death [47]; intracellular signaling cascade; JNK cascade; MAPKKK cascade; neurogenesis; phospholipid biosynthesis; protein amino acid glycosylation; protein kinase cascade; RNA splicing, via transesterification reactions; small GTPase mediated signal transduction; stress-activated protein kinase signaling pathway; ubiquitin cycle; vesicle-mediated transport; transcription from RNA polymerase II promoter [48] |
| ZIC3* | Cell adhesion; cell cycle; apoptosis; cell proliferation; cell-cell signaling; chromatin modification; cytoskeleton organization and biogenesis; development [56]; endocytosis; enzyme linked receptor protein signaling pathway; hexose metabolism; MAPKKK cascade; nervous system development; neurogenesis [55]; protein kinase cascade; small GTPase mediated signal transduction; striated muscle development; transmembrane receptor protein tyrosine kinase signaling pathway; vesicle-mediated transport |
*TFs have corresponding functions proven by experiments from previous studies. N/A stands for no enriched or conserved biological function categories from this study.
Figure 6Functionally conserved E2F1 binding sites in human and mouse genes. (a) Schematic alignment of functionally conserved E2F1 binding sites between human (hs) and mouse (mm) promoter sequences from between-TFBS distance of 20 bp. Also listed are the numbers of conserved E2F1 binding site(s) detected by phylogenetic footprinting (PF). Asterisks indicate promoters of genes with experimentally proven E2F1 binding sites. (b) Sequence alignment of synergistic E2F1 binding sites from the E2F1 gene and two E2F1 binding site clusters from the ACVR1 gene. Core motifs are shown in upper case letters, and the distances between adjacent binding sites are shown in brackets. Also shown are the locations of each binding site in relation to the transcription start site.
Significance of E2F1 synergy for different distance constraints and sensitivity/specificity for detecting synergistic E2F1 combinations by function conservation, phylogenetic footprinting, and EEL algorithm from experimentally proven E2F1 binding human promoters
| P(synergy/no. of TFBSs) | ||||||||||
| Distance | Real sequences | Randomized sequences | No. of genes* | PRF† | FPRF‡ | PRPF§ | FPRPF¶ | PREEL¥ | FPREEL# | |
| D10 | 0.039 | 0.027 | 1.6E-04 | 48 | 10.4% | 0.0% | 2.1% | 0.0% | 6.3% | 0.0% |
| D20 | 0.077 | 0.041 | 3.0E-17 | 92 | 8.7% | 0.0% | 3.3% | 0.0% | 3.3% | 0.0% |
| D30 | 0.109 | 0.064 | 1.1E-18 | 125 | 8.8% | 0.0% | 2.4% | 0.0% | 5.6% | 0.0% |
| D40 | 0.139 | 0.084 | 2.2E-21 | 159 | 11.9% | 0.0% | 3.1% | 0.0% | 5.7% | 0.0% |
| D50 | 0.171 | 0.098 | 1.6E-31 | 192 | 12.5% | 0.0% | 3.1% | 0.0% | 6.8% | 0.0% |
| D60 | 0.186 | 0.119 | 2.4E-24 | 209 | 12.0% | 0.0% | 3.3% | 0.0% | 8.1% | 0.0% |
| D70 | 0.215 | 0.141 | 6.0E-26 | 244 | 12.7% | 0.0% | 2.9% | 0.0% | 7.8% | 0.0% |
| D80 | 0.243 | 0.154 | 6.0E-34 | 272 | 14.7% | 0.0% | 2.9% | 0.0% | 8.1% | 0.4% |
| D90 | 0.265 | 0.167 | 7.1E-38 | 293 | 14.3% | 0.0% | 3.1% | 0.0% | 7.8% | 0.7% |
| D100 | 0.280 | 0.177 | 4.6E-40 | 309 | 14.2% | 0.0% | 3.2% | 0.0% | 8.1% | 0.6% |
| D200 | 0.400 | 0.312 | 1.2E-22 | 419 | 17.7% | 0.0% | 2.6% | 0.0% | 7.6% | 1.9% |
| D300 | 0.477 | 0.393 | 2.6E-19 | 487 | 18.1% | 0.0% | 2.5% | 0.0% | 7.2% | 2.7% |
| D400 | 0.524 | 0.459 | 4.9E-12 | 527 | 19.7% | 0.0% | 2.3% | 0.0% | 7.4% | 3.2% |
| D500 | 0.559 | 0.505 | 5.3E-15 | 557 | 20.3% | 0.0% | 2.2% | 0.0% | 7.7% | 3.2% |
| D600 | 0.579 | 0.547 | 5.0E-04 | 575 | 20.2% | 0.0% | 2.1% | 0.0% | 8.0% | 3.3% |
| D700 | 0.600 | 0.578 | 1.1E-02 | - | - | - | - | - | - | - |
| D800 | 0.612 | 0.604 | 1.9E-01 | - | - | - | - | - | - | - |
| D900 | 0.618 | 0.617 | 4.9E-01 | - | - | - | - | - | - | - |
| None | 0.618 | 0.619 | 5.2E-01 | - | - | - | - | - | - | - |
*The number of genes whose promoters have at least two E2F1 binding sites. †PRF : positive rate from our function conservation approach. ‡FPRF: false positive rate from our function conservation approach. §PRPF: positive rate from phylogenetic footprinting approach. ¶FPRPF: false positive rate from phylogenetic footprinting approach. ¥PREEL: positive rate from EEL algorithm. #FPREEL: false positive rate from EEL algorithm.
Figure 7Topology of TF-TF interaction network. TF-TF relationships based on 78 synergistic TF combinations from different PWMs. Also shown are representative motif logos from both the small and the large clusters.
Comparison of function conservation approach with EEL algorithm
| Function conservation | EEL algorithm | |
| TFBS detection | Finding all potential TFBSs | Finding all potential TFBSs |
| Distance constraint used | Yes | Yes |
| Alignment technique used | None | Non-direct DNA sequence alignment |
| Distance between TFBSs | Any | Relatively close |
| Number of genes compared | Identification of conserved TFBSs at multiple gene level | Identification of conserved TFBSs at single gene level |
| Parameters used for predicting interacting TFs | Conserved TFBS with function conservation of TFs at multiple gene (genome scale) level | Conserved TFBS with TF binding affinity at single gene level |
| Sensitivity* | Higher | Lower |
| Specificity* | Higher | Lower |
*Relative comparison results between the function conservation approach and EEL algorithm from this study.