| Literature DB >> 28454114 |
Lu-Qiang Zhang1, Qian-Zhong Li1.
Abstract
Transcription factors and histone modifications are vital for the regulation of gene expression. Hence, to estimate the effects of transcription factors binding and histone modifications on gene expression, we construct a statistical model for the genome-wide 15 transcription factors binding data, 10 histone modifications profiles and DNase-I hypersensitivity data in three mammalian. Remarkably, our results show POLR2A and H3K36me3 can highly and consistently predict gene expression in three cell lines. And H3K4me3, H3K27me3 and H3K9ac are more reliable predictors than other histone modifications in human embryonic stem cells. Moreover, genome-wide statistical redundancies exist within and between transcription factors and histone modifications, and these phenomena may be caused by the regulation mechanism. In further study, we find that even though transcription factors and histone modifications offer similar effects on expression levels of genome-wide genes, the effects of transcription factors and histone modifications on predictive abilities are different for genes in independent biological processes.Entities:
Keywords: Chromosome Section; DNase-I hypersensitivity; histone modifications; regulation mechanism; statistical redundancy; transcription factors
Mesh:
Substances:
Year: 2017 PMID: 28454114 PMCID: PMC5522221 DOI: 10.18632/oncotarget.16988
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1List of the TFs involved in the current study for H1, Gm12878 and K562
Prediction accuracy of log-linear and SVR model
| TFs | HMs+DNase | TFs+HMs+DNase | ||
|---|---|---|---|---|
| log-line regression | 0.404 | 0.529 | 0.555 | |
| SVR | 0.544 | 0.594 | 0.635 | |
| log-line regression | 0.495 | 0.668 | 0.649 | |
| SVR | 0.617 | 0.719 | 0.730 | |
| log-line regression | 0.527 | 0.641 | 0.633 | |
| SVR | 0.627 | 0.690 | 0.688 |
The CV-R2 is the average R2 for the 10 fold cross-validation.
Figure 2The PCC distributions for all combination of 15 TFs or 10 HMs and DNase
A., B. H1, C., D. Gm12878 and E., F. K562 cell line. X-axis represents the combination of c kinds of HMs and DNase (choose c out of 10 HMs and DNase, c = 1,2,…,11) or d kinds of TFs (choose d out of 15 TFs, d = 1,2,…,15), and the black curves represent the maximum PCC for the combination mode of c HMs and DNase or the combination mode of d TFs.
The combination modes of the maximum prediction accuracy for four factors
| cell line | factor | components for the combination | PCC |
|---|---|---|---|
| TFs | POLR2A,SIX5,MAX,SUZ12 | 0.725 | |
| HMs+DNase | H3K36me3, H3K27me3,H3K4me3,H3K9me3 | 0.763 | |
| TFs | GABPA,NFATC1,POLR2A,TCF3 | 0.789 | |
| HMs+DNase | H3K79me2,H3K36me3,H3K27me3,H3K4me3 | 0.845 | |
| TFs | ELF1,PML,POLR2A,ZBTB7A | 0.791 | |
| HMs+DNase | H3K36me3,H3K79me2,H3K9me3,H3K27me3 | 0.830 |
Figure 3The appearance frequency of each HM in the studied modes
A. The frequency of each HM in H1 cell line, where the integer represents the occurrence times in the studied modes. B. Venn diagram shows that the co-occurrence times of the four important HMs. C. and D. The frequency of each HM in Gm12878 and K562.
Figure 4Heatmaps of PCC both within TFs (HMs) and between TFs and HMs for the three cell lines
A., B. and C. represent H1, Gm12878 and K562 cell lines, respectively.
Figure 5Venn diagram shows the number of the co-regulated and solo-regulated genes within and between TFs and HMs
The blue depicts the co-regulated target genes, the pink and purple represent solo-regulated genes by factors attach to the charts, respectively.
Figure 6The interaction network among TFs, HMs and gene expression for H1 cell line
In the network, nodes represent TFs, HMs and gene expression. Edges show the partial correlation coefficient between each paired factors, where the dash lines represent negative correlations and solid lines represent positive correlations. Bolder the line is, the stronger correlation it represents.
List of three random GO-ID for each ratio range in the three cell lines
| Cell lines | GO-ID | Go-term | TF_PCC | HM_PCC | Ratio |
|---|---|---|---|---|---|
| GO:0010212 | response to ionizing radiation | 0.591 | 0.897 | 0.659 | |
| GO:0046777 | protein autophosphorylation | 0.690 | 0.897 | 0.770 | |
| GO:0016569 | covalent chromatin modification | 0.612 | 0.750 | 0.816 | |
| GO:0006323 | DNA packaging | 0.775 | 0.834 | 0.929 | |
| GO:0023061 | signal release | 0.890 | 0.926 | 0.961 | |
| GO:0007409 | axonogenesis | 0.869 | 0.838 | 1.037 | |
| GO:0007010 | cytoskeleton organization | 0.818 | 0.659 | 1.240 | |
| GO:0006508 | proteolysis | 0.716 | 0.568 | 1.260 | |
| GO:0030163 | protein catabolic process | 0.845 | 0.429 | 1.970 | |
| GO:0009117 | nucleotide metabolic process | 0.442 | 0.702 | 0.630 | |
| GO:0040007 | growth | 0.630 | 0.869 | 0.725 | |
| GO:0006875 | cellular metal ion homeostasis | 0.666 | 0.879 | 0.757 | |
| GO:0065007 | biological regulation | 0.629 | 0.691 | 0.910 | |
| GO:0016192 | vesicle-mediated transport | 0.781 | 0.805 | 0.970 | |
| GO:0006325 | chromatin organization | 0.836 | 0.803 | 1.041 | |
| GO:0045786 | negative regulation of cell cycle | 0.898 | 0.725 | 1.238 | |
| GO:0006629 | lipid metabolic process | 0.853 | 0.654 | 1.304 | |
| GO:0043087 | regulation of GTPase activity | 0.962 | 0.636 | 1.513 | |
| GO:0023061 | signal release | 0.749 | 0.906 | 0.828 | |
| GO:0007009 | plasma membrane organization | 0.820 | 0.933 | 0.879 | |
| GO:0006396 | RNA processing | 0.583 | 0.651 | 0.894 | |
| GO:0007155 | cell adhesion | 0.780 | 0.853 | 0.915 | |
| GO:0030097 | hemopoiesis | 0.844 | 0.866 | 0.975 | |
| GO:0030162 | regulation of proteolysis | 0.821 | 0.796 | 1.032 | |
| GO:0006952 | defense response | 0.746 | 0.669 | 1.114 | |
| GO:0045087 | innate immune response | 0.813 | 0.709 | 1.147 | |
| GO:0051049 | regulation of transport | 0.807 | 0.608 | 1.329 |
List of five random GO-ID where TFs and HMs model show distinct PCC for the same biological process in the different cell lines
| GO-ID | GO_term | H1-TFs | H1-HMs | Gm12878 | Gm12878 | K562 | K562 |
|---|---|---|---|---|---|---|---|
| negative regulation of phosphorylation | 0.922 | 0.806 | 0.569 | 0.933 | 0.846 | 0.796 | |
| negative regulation of signal transduction | 0.767 | 0.689 | 0.651 | 0.860 | 0.738 | 0.723 | |
| cellular ion homeostasis | 0.942 | 0.866 | 0.668 | 0.897 | 0.693 | 0.815 | |
| cellular cation homeostasis | 0.939 | 0.874 | 0.668 | 0.897 | 0.692 | 0.830 | |
| cation homeostasis | 0.903 | 0.877 | 0.658 | 0.903 | 0.704 | 0.822 |
The predictive results compare with other studies
| cell lines | factors | CV-R2 | method | |
|---|---|---|---|---|
| Gm12878 | c-FOS, | 0.390 | SVR | |
| Gm12878 | 0.617 | SVR | ||
| Gm12878 | 0.412 | log-linear regression | ||
| Gm12878 | H3K27ac, | 0.719 | SVR | |
| H1 | H2AZ, | 0.79 | two-step | |
| H1 | 0.79 | SVR |
The bold represents co-factors in the comparison.