| Literature DB >> 28068916 |
Chia-Chun Yang1,2,3, Min-Hsuan Chen2, Sheng-Yi Lin1,3, Erik H Andrews4, Chao Cheng5,6, Chun-Chi Liu7,8,9, Jeremy J W Chen10,11,12.
Abstract
BACKGROUND: Transcription factors (TFs) often interact with one another to form TF complexes that bind DNA and regulate gene expression. Many databases are created to describe known TF complexes identified by either mammalian two-hybrid experiments or data mining. Lately, a wealth of ChIP-seq data on human TFs under different experiment conditions are available, making it possible to investigate condition-specific (cell type and/or physiologic state) TF complexes and their target genes.Entities:
Keywords: ChIP-seq; Condition-specific target; Database; TF-TF complexes; Transcription factor
Mesh:
Substances:
Year: 2017 PMID: 28068916 PMCID: PMC5223348 DOI: 10.1186/s12864-016-3450-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of the CST pipeline. a Given a ChIP-seq sample, primary TF target genes are identified using the TIP algorithm. b For motif discovery, the binding peaks on target gene promoters are first identified using the narrow peaks located in the putative promoters of the TIP-predicted target genes. c For binding motif discovery, the binding peaks on the target genes are selected, and MEME is used to discover primary TF binding motifs. d FIMO is used to locate the primary TF binding motifs in the binding peaks of the primary TF target genes. e Using the binding motifs generated from MEME for all TFs and adding in motifs from the JASPAR database, SpaMo is used to search for binding motifs of potential partner TFs and to analyse their statistical significances based on their spacings. f The resulting predicted TF complexes and their target genes are reported with GO enrichment results. Target genes are stratified into high- and low-confidence groups based on the SpaMo-calculated statistical significance of their TF complex binding motif spacing
Fig. 2Comparison and validation of CST-predicted TF complexes. In (a) and (b), we compared the presence of CST-predicted TF complexes relative to SpaMo-predicted TF complexes in an external, experimentally derived database of TF complexes to demonstrate the performance of CST. a The x-axis represents the TF complexes ordered by their SpaMo-calculated p-values (from most to least significant), and the y-axis represents the enrichment ratio. The best enrichment ratios of CST and SpaMo were approximately 32 and 18, respectively. CST has greater enrichment than SpaMo across all p-values. The enrichment ratio was calculated as the ratio of predicted TF complexes in the database relative to the number of 1000 randomly generated TF complexes in the database. b Similar to (a), the top N of TF complexes calculated by p-values are used. The best enrichment ratios of CST and SpaMo were approximately 32 and 14, respectively. CST demonstrated greater enrichment than SpaMo across the entire N range. In (c) and (d), we validated the condition-specific TF-TF interactions using TRMs to demonstrate the condition-specific accuracy. The nodes are TFs, and the edges indicate interactions. GATA2 and TAL1 (grey colour) are present in both TRM and ENCODE ChIP-seq data. Combined GATA2 and TAL1 TRMs in HSCs contained 16 TF-TF interactions (c), whereas 10 predicted TF-TF interactions were identified in CST using GATA2 and TAL1 ChIP-seq data in K562 cells (d). The bold edges indicate TF-TF interactions common between TRMs and CST. Four significant TF complexes between TRMs and CST are indicated with bold edges (P = 3*10−4; Fisher’s exact test), suggesting the consistency of TRM and CST
Eight high-confidence target genes of the USF2-NFYA complex derived from K562 USF2 ENCODE ChIP-seq data
| Location of motif pairsa | Target geneb | Motif spacingsc |
|---|---|---|
| Chr14: 20923275-20923304 | OSGEP, APEX1 | 9 bps |
| Chr7: 108210264-108210293 | THAP5, DNAJB9 | 9bps |
| Chr4: 99850329-99850358 | EIF4E | 9bps |
| Chr16: 4897410-4897439 | GLYR1, UBN1 | 9bps |
| Chr12: 104359548-104359577 | TDG | 9bps |
aThe location of the predicted USF2 and NFYA motif pair from K562 USF2 ChIP-seq data and the motif database
bThe target genes for which the motif pairs occur in their putative promoters and are TIP-derived target genes of USF2 (the primary TF)
cThe spacing of the USF2-NFYA motif pairs on the putative promoters of the target genes
Fig. 3Validation of predicted targets of the USF2-NFYA complex using ChIP-qPCR and RT-PCR. a ChIP-qPCR with a NFYA pull-down and qPCR amplification against CST NFYA-USF2-predicted target genes. The genomic DNA from K562 cells (left panel) and HeLa cells (right panel) that immunoprecipitated with NFYA and nonspecific IgG antibodies was used for qPCR to assess the fold enrichment of the respective gene promoters in NFYA-IP DNA over IgG-IP for each gene. The fold enrichments were the averages of three independent experiments and the data were presented as the means ± standard errors. HoxB4 was used as a positive control (see Methods). b Same as (a) with a USF2 pull-down. c The expression level of USF2 in HeLa cells with USF2 silencing by siRNA. Upper panel: Western blot; β-tubulin: internal control. Lower panel: real-time RT-PCR; TBP: internal control. d The expression levels of three downstream genes of USF2 in HeLa cells with USF2 silencing, as determined by real-time RT-PCR. *P < 0.05, compared with scramble control
A partial list of the GO analysis results of target genes predicted by two different putative complexes using the K562 USF2 ChIP-seq data
| Rank | Enrichment GO term |
|
|---|---|---|
| Predicted USF2-IRF1 complexa | ||
| 1 | GO:0032870 cellular response to hormone stimulus | 2.15E-07 |
| 2 | GO:0033572 transferrin transport | 7.16E-07 |
| 3 | GO:0015682 ferric iron transport | 7.16E-07 |
| 4 | GO:0071495 cellular response to endogenous stimulus | 1.20E-06 |
| 5 | GO:0071375 cellular response to peptide hormone stimulus | 1.55E-06 |
| 6 | GO:0015031 protein transport | 2.94E-06 |
| 7 | GO:0044437 vacuolar part | 3.21E-06 |
| 8 | GO:0005654 nucleoplasm | 3.82E-06 |
| 9 | GO:0043434 response to peptide hormone stimulus | 7.11E-06 |
| 10 | GO:0006826 iron ion transport | 7.92E-06 |
| Predicted USF2-NFYA complexb | ||
| 1 | GO:0004536 deoxyribonuclease activity | 1.53E-04 |
| 2 | GO:0016798 hydrolase activity, acting on glycosyl bonds | 2.58E-04 |
| 3 | GO:0019104 DNA N-glycosylase activity | 3.79E-04 |
| 4 | GO:0006886 intracellular protein transport | 5.00E-04 |
| 5 | GO:0044419 interspecies interaction between organisms | 5.43E-04 |
| 6 | GO:0006308 DNA catabolic process | 6.97E-04 |
| 7 | GO:0044265 cellular macromolecule catabolic process | 7.91E-04 |
| 8 | GO:0016799 hydrolase activity, hydrolysing N-glycosyl compounds | 1.16E-03 |
| 9 | GO:0060674 placenta blood vessel development | 1.29E-03 |
| 10 | GO:0032507 maintenance of protein location in cell | 1.42E-03 |
aGO enrichment results for targets of the predicted USF2-IRF1 complex. The second, third and tenth GO terms are related to iron transport. A previous study reported that USF2 and IRF1 co-regulate β2-microglobulin, which regulates iron metabolism and transport
bGO enrichment results for targets of the predicted USF2-NFYA complex. The top ten GO term results are associated with DNA catabolism and clearly differed from the results of the predicted USF2-IRF1 complex
Literature approval of the predicted TF complex formation from NFYA ChIP-seq data in HeLa S3 cells
| Partner binding motifa | Predicted partnerb | SpaMo | Referenced |
|---|---|---|---|
| E HeLa S3/FOS | FOS | 5e-06 | Fleming et al., [ |
| E Sknsh/RFX5 | RFX5 | 3.6e-05 | Jabrane-Ferrat et al., [ |
| E Hep G2/SREBP2 | SREBP2 | 0.0015 | Dooley et al., [ |
| J NFIC | NFIC | 0.0015 | NA |
| J TBP | TBP | 0.0076 | Lee et al., [ |
| E GM12878/TBLR1 | SP1 | 0.014 | Ravasi et al., [ |
| E GM12878/CDP | SP1 | 0.015 | Ravasi et al., [ |
| E H1hesc/Rad21 | Rad21 | 0.018 | NA |
| J SP1 | SP1 | 0.021 | Ravasi et al., [ |
| E K562/GTF2B | TBP | 0.022 | Lee et al., [ |
| E Hep G2/MAZ | SP1 | 0.023 | Ravasi et al., [ |
aThe source of the partner binding motif. Summary names are used in the first column, in which “E HeLa S3/FOS” indicates the secondary motif from the ENCODE FOS ChIP-seq sample in the HeLa S3 cell line, and “J NFIC” indicates the motif from the JASPAR NFIC motif
bThe list of the NFYA-partner TF complexes
cThe p-value for the significant spacing of the binding motifs from SpaMo
dThe external studies that support the existence of the TF complex
NA: reference is not available