| Literature DB >> 30709331 |
Anna V Lioznova1, Abdullah M Khamis2, Artem V Artemov1,3,4, Elizaveta Besedina3, Vasily Ramensky5, Vladimir B Bajic2, Ivan V Kulakovskiy6,7,8, Yulia A Medvedeva9,10,11.
Abstract
BACKGROUND: DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis.Entities:
Keywords: CAGE; Chromatin states; CpG traffic lights; DNA methylation; ETS; Enhancers; IRF; NRF1; STAT; Transcription regulation
Mesh:
Substances:
Year: 2019 PMID: 30709331 PMCID: PMC6359853 DOI: 10.1186/s12864-018-5387-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic representation of a CpG traffic light detection. Left panel. Suppose we analyze a particular genomic region (chr1:123..11654), which contains for simplicity one gene. For each CpG in this region and the gene we have methylation and expression vectors in 6 cell lines, respectively. CpG dinucleotides are represented by dark blue lollipops (filled: methylated CpG, empty: unmethylated CpG). First three CpGs are located within the promoter region, while the last three are located in the gene body. Gene expression or lack of it is represented by green arrows. Right panel. The yellow column shows methylation of a random CpG (used as a background), the methylation vector of this CpG demonstrates low correlation with the gene expression (the green box on the right, in RPKM). Correlation between the average promoter/gene body methylation (shown in the light blue and light purple columns, respectively) and the corresponding gene expression is also low. However, for the CpG TL (shown in the red box), the methylation significantly correlates with the gene expression
The number of genes with significant correlation between expression and methylation
| FDR-corrected | Total number of genes, which have significant correlations between gene expression and methylation | |||
|---|---|---|---|---|
| Average methylation of promoter regions (-1000..500) (1) | Average methylation of gene bodies (+500..TTS) (2) | Methylation of CpG TL (3) | Permutation test (4) | |
| 0.001 | 263 | 186 | 1463 | 14.5 |
| 0.005 | 537 | 505 | 4905 | 15.4 |
| 0.01 | 764 | 762 | 7997 | 16.2 |
| 0.05 | 2038 | 2125 | 22957 | 21.8 |
| 0.1 | 3251 | 3401 | 34095 | 27.5 |
Note: for multiple testing correction the number of genes was used in (1) and (2), while the number of all CpG - gene pairs was used for the same purpose in (3) and (4). (4) Permutation test (RPKM) results: the number of genes with significant correlation between expression and methylation obtained by chance (averaged over 10 random permutations). (TTS) refers to a Transcription Termination Site
Fig. 2Evolutionary conservation of the CpG TL compared to the background CpG sites (BG). a Conservation in mammals and b in primates, c repeats determined by RepeatMasker, d Eigen non-coding functionality score. Whiskers (abc) represent standard deviation out of the 50 random background samples. Fisher exact test, p-value <5E−4 (a - c), Kolmogorov-Smirnov statistic for 2 samples p-value <5E−4 (d)
Fig. 3CpG TL in regulatory regions. Over-representation of CpG TL in a open chromatin regions (DNaseI), b transcription start sites determined by CAGE, c enhancers determined by histone modifications, d enhancers determined by FANTOM5. No difference between CpG TL and CpG BG counts in e CpG islands while f CpG TL are over-represented in CpG islands shores. Panel g represents averaged across 127 cell types ratio of TL / BG in chromatin states determined by chromHMM. The color g reflects absolute number of the CpG TL located in a given chromatin state. Whiskers (a-f) represent standard deviation of the 50 random background samples. Fisher exact test, p-value <5E−4
Fig. 4Functional categories of human enhancers enriched with CpG TL (negative SCC). Fisher’s exact test and FDR (Benjamini-Hochberg) correction for multiple testing (implemented in python scipy.stats.fisher_exact and p_adjust (method=’fdr’) from R) were used to calculate the p-values
Enrichment of CpG TL in regulatory genes
| Gene type | # genes of in the annotation | # genes with CpG TL | # genes expected | fold enrichment | over-repre-senta-tion | |
|---|---|---|---|---|---|---|
| Epigenetic regulators | 719 | 279 | 98.56 | 2.83 | + | 1.4E-63 |
| Histones | 94 | 17 | 12.89 | 1.32 | + | 0.23 |
| Transcription factors | 1751 | 599 | 240.02 | 2.50 | + | 1.06E-108 |
| Transcription co-factors | 951 | 356 | 130.36 | 2.73 | + | 4.69E-76 |
Fig. 5CpG TL within transcription factor binding sites. CpG TL to CpG BG ratio vs Fisher’s exact test p-value within a all predicted TFBS; b TFBS and 50 nt shores for CpG TL with a negative SCC and c the same for CpG TL with a positive SCC. Length-normalized distribution of CpG TL / CpG BG counts (negative SCC) within TFBS and 100 nt shores: d NRF1; f SPIB; h STAT1; j GABPA; m IRF4. Per position distribution of CpG TL / CpG BG counts (negative SCC) within TFBS with logo: e NRF1; g SPIB; i STAT1; k GABPA; n IRF4. In vitro binding preferences of unmethylated and methylated oligos: l GABPA; o IRF4