| Literature DB >> 30453943 |
Guillaume Devailly1, Anagha Joshi2.
Abstract
BACKGROUND: Transcription regulation is a major controller of gene expression dynamics during development and disease, where transcription factors (TFs) modulate expression of genes through direct or indirect DNA interaction. ChIP sequencing has become the most widely used technique to get a genome wide view of TF occupancy in a cell type of interest, mainly due to established standard protocols and a rapid decrease in the cost of sequencing. The number of available ChIP sequencing data sets in public domain is therefore ever increasing, including data generated by individual labs together with consortia such as the ENCODE project.Entities:
Keywords: ChIP seq; Data integration; Transcription control; Transcription factors; Transcriptional regulation
Mesh:
Substances:
Year: 2018 PMID: 30453943 PMCID: PMC6245581 DOI: 10.1186/s12859-018-2377-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Summary of the analyses performed. Each blue point indicates that the corresponding dataset was used to perform the analysis
Fig. 2Correlation between ChIP-seq experiments in four datasets. a-d Correlation heatmaps of all ChIP-seq experiments included in four datasets: A: ENCODE human, B: ENCODE mouse, C: CODEX human, D: CODEX mouse. Specific clusters are highlighted with annotation after hierarchical clustering of the correlation matrices. TSS locations were added as a track in each heatmaps and are shown in green. e A subset of panel A restricted to samples from HUVEC. f A subset of the ENCODE mouse dataset showing that while CTCF and RNAPII ChIP-seq clustered together in heart and liver tissues, EP300 ChIP-seq are clustered by tissue. g Subset of the CODEX mouse dataset including all ChIP-seq done in mouse embryonic stem cells. Major clusters have been annotated
Fig. 3Distribution of peak distances from the nearest TSS. a and b Stack view of all TF ChIP-seq experiments in human a and mouse b ENCODE datasets. For each ChIP-seq, fraction of peaks overlapping a TSS (white), upstream (blue) or downstream (red) of the nearest TSS was computed. Experiments were sorted according to the fraction of the peaks overlapping a TSS, to the exclusion of experiments showing more than 50% of peaks upstream (pink side bar) or downstream (green side bar) of the nearest TSS. c and g A focus on the experiments showing more than 50% of peaks upstream (pink side bar) or downstream (green side bar) of the nearest TSS. c: in human. g in mouse. d MAFK ChIP-seq experiments have few peaks overlapping a TSS. e BRCA1 ChIP-seq experiments have most of their peaks overlapping a TSS. f C-Fos ChIP-seq experiments have a variable fraction of peaks overlapping a TSS. h Fraction of peaks overlapping a TSS was compared between mouse and human for each TF present in both datasets. x-axis: fraction of peaks overlapping a TSS in human. y-axis: fraction of peaks overlapping a TSS in mouse. Grey cross: Range of the Median Absolute Deviation (MAD) of the fraction of peaks overlapping a TSS in cases where several experiments where done for a given TF. Blue line: linear regression. Doted red line: x = y line. cor: Pearson correlation coefficient
Fig. 4TF peaks farther from TSS are more cell type-specific than peaks overlapping a TSS. a-d Correlation heatmaps of TF ChIP-seq experiments in human (A and B) or mouse (C and D) datasets from only the peaks at less than 1 kb from the nearest TSS (A and C) or from only the peaks at 10 kb or more of the nearest TSS (upstream or downstream). Colour side bars indicate the cell of origin of the ChIP-seq experiments. Only experiments performed in cell types with more than 30 ChIP-seq experiments are shown. e and f K-means clustering of highly-bound regions in human e and mouse f. TF density: proportion of ChIP-seq experiment with a peak at a given location. Top colour bars: proportion of peaks upstream (blue), at (white), or downstream of (red) the nearest TSS. Very highly bound regions in all cells tend to mostly overlap a TSS. Cell specific highly bound regions tends to be more distal. A large lowly TF-bound cluster of genomic regions was removed from the figure for clarity in both human and mouse
Top 3 significant associations between factors in mouse and human cell types
| Cell type | Combination 1 | Combination 2 | Combination 3 |
|---|---|---|---|
| B cells (M) | E2A, FoxO1, Pax5 | E2A, Ebf1, Oct2 | Pax5, Smad3, FoxO1A |
| T cells (M) | Stat3, Stat4, Stat5, Stat6 | Stat5a, Stat5b, Stat5 | Fli1, Gata3 |
| Dendritic cells (M) | Hif1a, Irf1, Maff, Relb, Stat3, Rel, Irf2Irf4 | Hif1a, Irf1, Maff, Relb, Rel, Irf2Irf4 | - |
| Macrophages (M) | CEBPA, CEBPB | CEBPA, CEBPB, PU1, STAT1 | CEBPA, CEBPB, PPARG, PU1, STAT1 |
| Erythroid cells (M) | ETO2, GATA1, LDB1, MTGR1, SCL | GATA1, LDB1, MTGR1, SCL | - |
| MK progenitors (M) | CBFB, GATA1, GATA2, RING1B, RUNX1 | CBFB, ETS1, RING1B, RUNX1 | - |
| MEL (M) | JunD, SMC3 | GATA1, SCL | CMYC, MAX, MXI1, NELFE, SCL, TBP |
| ES cells (M) | Suz12, SOX2 | E2F1, nMYC | E2F1, KLF4, nMYC |
| A549 (H) | HDAC6, P300, ELF1, ETS1, GABP | ATF3, BRF1 | RNA polII, CTCFL |
| GM12878 (H) | SAP30, TAF7 | STAT5A, BRG1 | P300, ETS1 |
| H1-hESC (H) | RAD21, ZNF143 | BACH1, MAFK | USF1, USF2 |
| HeLa-S3 (H) | EZH2, RNA polII, SIN3A, CJUN, CMYC | GTF2B, NR2F2 | RNAPII, RBBP5 |
| HepG2 (H) | TAF1, TAF7, TEAD4 | SAP30, ATF1, ATF3 | PU1, STAT5A |
| K562 (H) | GTF2F1, CTCF | HDAC1, CJUN | E2F6, CTCF |
All associations were predicted at very high significance (all P-values <1e-256). M - mouse, H - human