| Literature DB >> 23482391 |
Kyoung-Jae Won1, Xian Zhang, Tao Wang, Bo Ding, Debasish Raha, Michael Snyder, Bing Ren, Wei Wang.
Abstract
Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23482391 PMCID: PMC3632130 DOI: 10.1093/nar/gkt143
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) The structure of ChroModule. There are six modules in ChroModule: forward promoter, backward promoter, enhancer, H3K36me3-enriched region (transcribed region), H3K27me3-enriched region (repressed region) and background. Each module has a left–right structure, i.e. each state transits to itself or the states located to its right (12,15). (B) Emission probabilities of the five-state HMM for promoter. The fourth state represents the open chromatin region of depleted H3K4me1/2/3 and enriched DNaseI signals. (C) Example ChroModule annotation and the epigenomic data in the K562 cells. (D) Example ChroModule annotation of the eight cell types. The probability of each HMM state and EVS are shown in STAR browser (http://wanglab.ucsd.edu/star/browser) for each cell type.
Figure 4.Enriched motifs found by Homer (29).
Figure 2.Evaluation of the ChroModule performance on (A) promoters (accessed using RefSeq TSSs). ChroModule results (promoters and strong promoters) were obtained from ENCODE (36). (B) Assessment of the enhancers predicted by ChroModule and ChromHMM using p300-binding sites that are distal (>2.5 kb) from Refseq TSSs. ChroModule outperformed ChromHMM in all the cell types. ChromHMM results (enhancer, strong enhancer) were downloaded from the study of Ernst et al. (36). Supplementary Figure S6 has comparison in H1, K562 and GM12878. (C) Assessment of the enhancers predicted by ChroModule and ChromHMM using TF-binding sites in Gm12878 and K562 cells. (D) The comparison of ChroModule models independently trained in Huvec and GM12878 (V2). Receiver operating characteristic curves (ROC) curves generated by using RefSeq promoters to evaluate the promoter prediction.
Performance of ChroModule and ChromHMM on predicting promoters and enhancers evaluated using RefSeq promoters and distal p300-binding sites, respectively
| Cell | Promoter predictions | Distal p300 (enhancer) predictions | |||
|---|---|---|---|---|---|
| ChroModule | ChromHMM | Cell | ChroModule | ChromHMM | |
| H1 | 0.62 | 0.53 | H1 | 0.77 | 0.62 |
| GM12878 | 0.55 | 0.46 | GM12878 | 0.71 | 0.65 |
| K562 | 0.54 | 0.42 | K562 | 0.84 | 0.62 |
| Hmec | 0.58 | 0.53 | |||
| Hsmm | 0.54 | 0.56 | |||
| Huvec | 0.57 | 0.50 | |||
| Nhek | 0.57 | 0.52 | |||
| Nhlf | 0.57 | 0.52 | |||
Area under curve (AUC) of the ROC curve is shown. The values are scaled to the maximum value.
Functions of the genes in the epigenetically invariable and variable regions
| Number of blocks | Genes | GO terms (number of genes; | |
|---|---|---|---|
| Invariable promoters | 9422 | 8540 | RNA processing (429; 1.4 e-65 |
| Cellular macromolecule catabolic process (517; 4.0e-56) | |||
| Cell cycle (536; 2.0e-51) | |||
| Invariable transcribed region | 983 | 668 | RNA processing (78;3.3e-21 |
| Translation (54;6.8e-17) | |||
| Invariable enhancers | 271 | 238 | Cell death (25; 1.8e-3 |
| Regulation of apoptosis (24;1.4e-2) | |||
| Invariable repressed region | 216 | 58 | Neuron differentiation (14; 5.5e-8 |
| Variable region | 16 876 | 1319 | Cell adhesion (86;3.1e-5 |
| Cell–cell signalling (73; 1.8e-4) |
DAVID (33) was used to perform GO analysis. We assigned enhancers to their closest gene, and multiple enhancers can be assigned to a single gene. Inside the parenthesis are the number of genes associated with each term and the Benjamini–Hochberg adjusted P-value.
aThe most significant biological process.
Figure 3.The epigenetic distance between cell types calculated based on the enhancer segmentation using the Pvclust R package (41). Clusters with unbiased P > 0.95 are indicated by the rectangles. See Supplementary Figure S8 for other clusters.
Cell-type–specific enhancers and the functions of the closest genes
| Type | Number of cell-type–specific enhancers | Number of assigned genes | GO terms |
|---|---|---|---|
| Common enhancers | 522 | 435 | Cell death (43;2.2E-5 |
| Apoptosis (38;1.2E-5) | |||
| H1 specific | 21 353 | 8274 | Human embryonic stem cell |
| Neuron differentiation (276;1.7E-23) ( | |||
| Cell morphogenesis involved in differentiation (169;7.0E-20) | |||
| GM12878 specific | 19 430 | 7928 | Lymphoblastoid |
| Regulation of lymphocyte activation (107;9.5E-14) ( | |||
| Regulation of leucocyte activation (116;1.2E-13) | |||
| Regulation of T cell activation (85;3.5E-11) | |||
| Hmec specific | 10 224 | 5159 | Human mammary epithelial cell |
| Cell motion (173;3.9E-6) ( | |||
| Cell adhesion (236;3.9E-6) | |||
| Hsmm specific | 10 934 | 5684 | Normal human skeletal muscle myoblasts |
| Skeletal system development (144;8.0E-11 | |||
| Huvec specific | 11 383 | 5492 | Human umbilical vein endothelial cell |
| Enzyme-linked receptor protein signalling pathway (145;2.8E-8) ( | |||
| Blood vessel development (100;9.8E-5) | |||
| K562 specific | 15 827 | 7287 | Leukaemia |
| Positive regulation of leucocyte proliferation (41;9.4E-5) ( | |||
| Positive regulation of lymphocyte proliferation (40;9.9E-5) | |||
| Nhek specific | 8356 | 4959 | Normal human epidermal keratinocytes |
| Cell morphogenesis involved in differentiation (104;1.1E-7 | |||
| Neuron projection morphogenesis (94; 5.5E-8) | |||
| Nhlf specific | 16 691 | 6377 | Normal human lung fibroblasts |
| Cell motion (219; 8.9E-11) ( | |||
| Lung development (53;1.5E-4) |
Because multiple enhancers can be assigned to the same gene, the number of assigned genes is often smaller than that of enhancers. We used DAVID (33) for GO analysis. Inside the parenthesis are the numbers of genes in each term and the Benjamini–Hochberg adjusted P-value. We selected three biological processes from the significant categories.
aThe most significantly enriched biological processes.