| Literature DB >> 28105934 |
Sung Hee Park1, Sun-Min Lee2, Young-Joon Kim3,4, Sangsoo Kim5.
Abstract
BACKGROUND: Various chromatin modifications, identified in large-scale epigenomic analyses, are associated with distinct phenotypes of different cells and disease phases. To improve our understanding of these variations, many computational methods have been developed to discover novel sites and cell-specific chromatin modifications. Despite the availability of existing methods, there is still room for further improvement when they are applied to resolve the histone code hypothesis. Hence, we aim to investigate the development of a computational method to provide new insights into de novo combinatorial pattern discovery of chromatin modifications to characterize epigenetic variations in distinct phenotypes of different cells.Entities:
Keywords: Association rule mining; Chromatin signature; Combinatorial histone modifications; Differential modifications; Hepatitis B virus X (HBx)-transgenic mice; Hepatocellular carcinoma
Mesh:
Substances:
Year: 2016 PMID: 28105934 PMCID: PMC5249029 DOI: 10.1186/s12859-016-1307-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A work flow diagram of the ChARM method
Representative association rules
| No | Rule description for promoter | Suppa | Confb | Lift | Annotationc |
|---|---|---|---|---|---|
| 1 | p.h3k27 = 5 p.h3k36 = 5 == > p.h3k4 = 1 | 0.018 | 0.35 | 1.87 | P155 |
| 2 | p.h3k4 = 1 p.h3k36 = 5 == > p.h3k27 = 5 | 0.018 | 0.34 | 1.74 | P155 |
| 3 | p.h3k4 = 1 p.h3k27 = 5 == > p.h3k36 = 5 | 0.018 | 0.33 | 1.80 | P155 |
| 4 | p.h3k27 = 5 p.h3k36 = 5 p.pol2s5 = 1 == > p.h3k4 = 1 | 0.005 | 0.41 | 2.17 | Super set & highest lift |
| 5 | p.h3k4 = 1,p.h3k27 = 5,p.h3k36 = 5 = > p.met = 2 | 0.007 | 0.39 | 1.05 | Super set & Lowest lift |
| 6 | p.h3k27 = 5 p.h3k36 = 5 p.met = 2 == > p.h3k4 = 1 | 0.007 | 0.36 | 1.89 | Super set & Top 5 lift |
| 7 | p.h3k4 = 1 p.h3k27 = 5 p.met = 2 == > p.h3k36 = 5 | 0.007 | 0.33 | 1.8 | Super set & Top 5 lift |
| 8 | p.h3k4 = 1 p.h3k36 = 5 p.pol2s5 = 1 == > p.h3k27 = 5 | 0.005 | 0.34 | 1.7 | Super set & Top 10 lift |
| 9 | p.h3k4 = 1 p.h3k27 = 5 p.pol2s5 = 1 == > p.h3k36 = 5 | 0.005 | 0.33 | 1.77 | Super set & Top 10 lift |
| 10 | p.h3k4 = 4,p.h3 = 3,p.h3k27 = 2,p.h3k36 = 2,p.met = 2 = > p.pol2s5 = 3 | 0.008 | 0.80 | 4.46 | Top 5 lift |
| 11 | p.h3k4 = 4,p.h3 = 3,p.h3k27 = 2,p.h3k36 = 2 = > p.pol2s5 = 3 | 0.010 | 0.73 | 4.09 | Top 5 lift |
| 12 | p.met = 2 | 0.373 | 0.37 | 1 | Top 5 support |
| 13 | p.pol2s5 = 1 = > p.met = 2 | 0.084 | 0.4 | 1.07 | Top 5 support |
| 14 | p.h3k27 = 3 = > p.met = 2 | 0.083 | 0.36 | 0.98 | Top 5 support |
| Rule description for gene body | |||||
| 15 | g.h3k27 = 5 g.h3k36 = 5 == > g.h3k4 = 1 | 0.048 | 0.56 | 2.58 | G155 |
| 16 | g.h3k4 = 1 g.h3k27 = 5 == > g.h3k36 = 5 | 0.048 | 0.54 | 2.88 | G155 |
| 17 | g.h3k4 = 1 g.h3k36 = 5 == > g.h3k27 = 5 | 0.048 | 0.53 | 2.86 | G155 |
| 18 | g.h3k4 = 1,g.h3k27 = 5,g.pol2s2 = 1,g.met = 5 = > g.h3k36 = 5 | 0.006 | 0.66 | 3.54 | Super set & Top 5 lift |
| 19 | g.h3k4 = 1,g.h3 = 5,g.h3k36 = 5,g.met = 1 = > g.h3k27 = 5 | 0.005 | 0.65 | 3.49 | Super set & Top 5 lift |
| 20 | g.h3k4 = 1,g.h3k36 = 5,g.met = 1 = > g.h3k27 = 5 | 0.017 | 0.64 | 3.45 | Super set & Top 5 lift |
| 21 | g.h3k4 = 1,g.h3k36 = 5,g.pol2s2 = 1,g.met = 1 = > g.h3k27 = 5 | 0.007 | 0.648 | 3.42 | Super set & Top 5 lift |
| 22 | g.h3k4 = 1,g.h3 = 1,g.h3k36 = 5,g.met = 1 = > g.h3k27 = 5 | 0.006 | 0.63 | 3.4 | Super set & Top 5 lift |
| 23 | g.h3k4 = 1,g.h3k27 = 5,g.h3k36 = 5,g.met = 1 = > g.h3 = 5 | 0.0053 | 0.328 | 1.55 | Super set & the lowest lift |
| 24 | g.h3 = 3,g.h3k27 = 3,g.pol2s2 = 4,g.met = 2 = > g.h3k36 = 4 | 0.008 | 0.79 | 3.86 | Top 5 lift |
| 25 | g.h3 = 3,g.h3k27 = 3,g.h3k36 = 4,g.pol2s2 = 4,g.met = 2 = > g.h3k4 = 4 | 0.005 | 0.64 | 3.68 | Top 5 lift |
| 26 | g.h3k36 = 5 = > g.h3k4 = 1 | 0.089 | 0.487 | 2.19 | Top 5 support |
| 27 | g.h3k4 = 1 = > g.h3k36 = 5 | 0.089 | 0.41 | 2.19 | Top 5 support |
| 28 | g.h3k27 = 5 = > g.h3k4 = 1 | 0.088 | 0.47 | 2.18 | Top 5 support |
| 29 | g.h3k4 = 1 = > g.h3k27 = 5 | 0.088 | 0.41 | 2.18 | Top 5 support |
| 30 | g.h3k27 = 5 = > g.h3k36 = 5 | 0.085 | 0.45 | 2.43 | Top 5 support |
| 31 | g.h3k36 = 5 = > g.h3k27 = 5 | 0.085 | 0.45 | 2.43 | Top 5 support |
There were 556 rules and 1853 rules discovered by ARM for promoters and gene bodies, respectively. From these rules, we selected those encoding Pattern 155 (Rule 1–3 and Rule 15–17) and its supersets with high lift values, which were within the top 5 or top 10 highest lift values from all the rules as representative examples. In the table, we also report rules in the top 5 supports
aSupp: Support of a rule
bConf: Confidence of a rule
cAnnotation: annotation of the rules corresponding to their categories
Fig. 2A global view of the chromatin modification patterns encoded in association rules. a All epigenetic signatures of chromatin modifications and Pattern 155. Each row corresponds to an association rule (i.e. a pattern or a combination of chromatin modification) and each column to a chromatin modification mark. Rules (556 for promoters and 1852 gene bodies) are clustered by chromatin modification marks. The colour in each cell indicates the differential change of marks in the livers of normal and HBx-transgenic mice. Light green and light red represent the extreme chromatin modification changes, e.g. hypo or hyper methylation of histone, respectively. The epigenetic signatures were tightly clustered into two groups, representing the modified and unmodified states of chromatin. Association rules in the yellow rectangles represent the epigenetic signatures of Pattern 155, which constitutes the combination of the loss of H3K4Me3 and the gains of H3K27Me3 and H3K36Me3. b Plots for the support and lift of the association rules. The grey scale represents the confidence levels and the coloured rectangles correspond to supersets of Pattern 155, which contained the three modified states of Pattern 155 as well as other chromatin marks. The rule length, which corresponds to the number of modified states, is > 3. Red, yellow, and blue rectangles correspond to rule length 3, 4, and 5, respectively
Fig. 3Correlation network of epigenetic modifications. The correlation networks were generated from correlations (r ≥ 0.2) between chromatin modification marks of transcripts in (a) P155, the promoter pattern for the HBx TG livers, b normal cells, and (c) HBx TG liver cells. Each node represents a chromatin modification mark and each edge width was weighted by Pearson correlation values (r). Green edges represent negative correlations and grey edges represent positive correlations. Each node name represents the abbreviated chromatin mark name: h3k4, h3k27, h3k36, pol2s5, met, h3, hx_exn, and pCpG_ratio denote H3K4Me3, H3K27M3, H3K36Me, Pol II S5, DNA methylation, H3, expression in HBx, and CpG ratio, respectively
Enrichment of functional elements in the patterns
| Functional elements | Mouse genome (MG) BPa | Ratio (MG)b | Patterns (P:P0) BPc | Ratio (P)d | Odds (M)e | Odds (P)f | Odds ratio (P/MG)g |
|---|---|---|---|---|---|---|---|
| Mouse genomes | 2,725,765,481 | 12,537,400 | |||||
| Non-gene | 1,687,863,859 | 0.619 | 5,268,326 | 0.420 | 1.626 | 0.725 | 0.446 |
| Promoter | 60,956,000 | 0.022 | 636,400 | 0.051 | 0.023 | 0.053 | 2.338 |
| Genes | 976,945,622 | 0.358 | 6,936,145 | 0.553 | 0.559 | 1.238 | 2.217 |
| Introns | 917,470,255 | 0.337 | 6,319,444 | 0.504 | 0.507 | 1.016 | 2.003 |
| Exons | 63,877,330 | 0.023 | 1,841,546 | 0.147 | 0.024 | 0.172 | 7.175 |
| Coding Exons | 34,016,873 | 0.012 | 1,424,442 | 0.114 | 0.013 | 0.128 | 10.143 |
| 5′-UTR | 6,222,075 | 0.002 | 211,139 | 0.017 | 0.002 | 0.017 | 7.487 |
| 3′-UTR | 24,574,772 | 0.009 | 389,841 | 0.031 | 0.009 | 0.032 | 3.527 |
All of the 200 base pair intervals (62,687 intervals identified by a genome-wide scan) that met the conditions of the P155 pattern for promoters were mapped to the functional elements of the mouse genome
a, bBase pairs of functional elements in the mouse genome and their ratio over the mouse genome
c, dBase pairs of functional elements overlapping with the 200 base pair intervals in the pattern and their ratio over the pattern
e, fOdds for each functional element in the mouse genome and Pattern 155, calculated by Eq. 1
gOdds ratio for each functional element between the pattern and the mouse genome, representing functional element enrichment in the pattern in comparison to the mouse genome
Fig. 4Epigenetic profiles of P155. a Differential changes of histone modifications between HBx TG and normal livers in Pattern 155. The plotted data are the dRES values summed over the member genes of Pattern 155 (50-bp interval). Promoter regions are divided into three regions relative to TSS: proximal (P: −200 to 500 bp), intermediate (I: −1000 to −200 bp) and distal (D: −1500 to −1,000 bp). Vertical grey lines in each figure represent the three promoter regions. b, c and d) A comparison between HBx and normal livers for (b) H3K4Me3, c H3K27Me3, and (d) H3K36Me3. b shows that H3K4Me3 was hypermethylated near the TSS regions in normal livers, whereas it underwent demethylation in HBx, displaying a strong negative peak in (a). e and f The changes in histone modification for (e) HCPs (242 transcripts) and (f) LCPs (43 transcripts) in Pattern 155. Blue bars in represent regions matched with Pattern 155. dRES changes of H3K4Me3, H3K27Me3 and H3K36Me3 from (a), (e) and (f) are coloured green, red, purple, respectively. dk4, dk27 and dk36 stand for H3K4Me3, H3K27Me3 and H3K36Me3, respectively
Fig. 5CpG ratio bias in Pattern 155. CpG ratio distributions in promoters and gene bodies for P155 (red), G155 (green), and all transcripts (grey). a In promoters, a high peak for the CpG ratio in P155 was observed where the CpG ratio was > 0.6, whereas two peaks were found for all transcripts, one in the low CpG ratio and one in high CpG ratio. b The CpG ratio distribution in gene bodies: 57 % of the G155 shows high CpG content (CpG ratio > 0.5). All transcripts and P155 show high peaks in the low CpG ratio (<0.4). (C and D) CpG ratio distributions alongside promoter regions for all transcripts: c HCPs (Additional file 3: Figure S4) and (d) LCPs of P155. e The proportion of 200 base pair intervals matched to Pattern 155 that corresponds to HCPs, ICPs, and LCPs alongside promoter regions. HCPs are more likely to match in intermediate or distal promoter regions, whereas LCPs are likely to match in proximal promoter regions around the TSS
Fig. 6Relationships between histone methylations, PolII, DNA methylation, and gene expression. a Relationships between PolIIS5 changes and each of three histone methylation marks in the promoters. b Relationships between DNA methylation and histone methylation in gene bodies. In (a) and (b), a green background represents the changes of all transcripts, while red (H3K4Me3), blue (H3K27Me3), and purple (H3K36Me3) rectangles represent the changes of (a) P155s and (b) G155s. Yellow rectangles in each plot correspond to the genes for which (a) PolIIS5 or (b) DNA methylation decreased (dRES < −0.5). c Gene expression in normal and HBx TG mouse livers, as measured by RNA-seq and exon array (Additional file 4: Figure S3). The majority of genes in P155 were not differentially expressed (grey dots). Among the differentially expressed genes in P155, substantially more genes were down-regulated (dark green dots) than up-regulated (red dots)
Enriched functional terms and canonical pathways, identified using DAVID and ingenuity pathway analysis (IPA)
| Category | Term or pathway |
|
|---|---|---|
| Promoter | ||
| SP_PIR_KEYWORDS | Transcription regulation | 2.00E-08 |
| GOTERM_MF | Transcription regulator activity | 6.72E-06 |
| GOTERM_BP | Regulation of transcription from RNA polymerase II promoter | 8.83E-06 |
| GOTERM_MF | Transcription factor activity | 2.56E-05 |
| SP_PIR_KEYWORDS | Phosphoprotein | 5.31E-05 |
| SP_PIR_KEYWORDS | DNA-binding | 9.99E-05 |
| GOTERM_BP | Regulation of RNA metabolic process | 1.31E-04 |
| GOTERM_BP | Positive regulation of transcription | 1.43E-04 |
| SP_PIR_KEYWORDS | Developmental protein | 1.56E-04 |
| SP_PIR_KEYWORDS | Activator | 1.85E-04 |
| SP_PIR_KEYWORDS | Repressor | 2.61E-04 |
| Canonical pathwaya | Role of NFAT in cardiac Hypertrophy | 4.36E-06 |
| Wnt/β-catenin signalling | 2.42E-04 | |
| Molecular Mechanisms of Cancer | 3.91E-04 | |
| cAMP-mediated signalling | 5.60E-04 | |
| Dopamine-DARPP32 Feedback in cAMP signalling | 6.05E-04 | |
| Gene body | ||
| GOTERM_MF | DNA binding | 1.82E-07 |
| INTERPRO | IPR001766:Transcription factor, fork head | 3.34E-06 |
| GOTERM_MF | Sequence-specific DNA binding | 9.11E-06 |
| SP_PIR_KEYWORDS | Developmental protein | 9.66E-06 |
| GOTERM_MF | Transcription regulator activity | 4.65E-05 |
| GOTERM_MF | Transcription factor activity | 9.25E-05 |
| SP_PIR_KEYWORDS | Transcription regulation | 1.62E-04 |
| Canonical pathwaya | Notch signalling | 4.89E-05 |
Only annotations with P < 0.02 after Benjamini-Hochberg correction for multiple hypothesis testing are presented. Full lists and more details are provided in Additional file 5: Table S1 and S2
aCanonical pathways were outputs from IPA analysis; other significant functional annotation terms were obtained from DAVID analysis