| Literature DB >> 27662487 |
Jingting Xu1, Hong Hu1, Yang Dai1.
Abstract
BACKGROUND: The identification of enhancers is a challenging task. Various types of epigenetic information including histone modification have been utilized in the construction of enhancer prediction models based on a diverse panel of machine learning schemes. However, DNA methylation profiles generated from the whole genome bisulfite sequencing (WGBS) have not been fully explored for their potential in enhancer prediction despite the fact that low methylated regions (LMRs) have been implied to be distal active regulatory regions.Entities:
Year: 2016 PMID: 27662487 PMCID: PMC5035071 DOI: 10.1371/journal.pone.0163491
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The proposed framework of LMethyR-SVM.
Fig 2The distributions of the LMRs and predicted enhancer windows.
(a): The distributions of the LMRs to the nearest TSSs; red for H1 and green for IMR90. (b): The distribution of all the enhancer windows to their nearest TSSs; blue for H1 and brown for IMR90. The distance was measured from the center point of a sequence to its nearest TSS.
The summary of the predicted enhancers obtained from LMethyR-SVM.
| Number of enhancer windows | Number of enhancers | Estimated genomic coverage | Median length of enhancers | Maximum length of enhancers | |
|---|---|---|---|---|---|
| H1 | 98,045 | 34,437 | 3.67% | 2.5 kb | 66 kb |
| IMR90 | 77,762 | 35,203 | 3.05% | 2.5 kb | 113 kb |
Enhancers are the continuous regions obtained by merging all overlapping enhancer windows and separated into two at non-enhancer windows if exist.
The genome coverage is the estimate from Chromosome 1.
Fig 3Results of comparison with other enhancer prediction models.
(a) for H1 and (b) for IMR90. “Validation” rates were computed as percentages of overlaps with either DHSs, p300 sites or enhancer-associated transcription factor binding sites (NANOG, CEBPB and TEAD4 for H1 and CEBPB for IMR90); “Misclassification” rates were computed as percentages of overlaps with the UCSC annotated TSSs. “Validated” enhancers can be further divided into one of the mutually exclusive categories: “p300+/-DHS”, “DHS only”, “TF+DHS”, “TF only”, “TF+P300”, “p300+DHS+TF”. For LMethyR-SVM, the highest-scored enhancer windows were used. The total numbers of the enhancers in H1 predicted from the individual methods are 17,828 (ChromHMM Strong), 217,350 (ChromHMM weak), 54,121 (RFECS), 37,263 (EnhancerFinder), 34,437 (LMethyR-SVM) and 34,437 (Random). The total numbers of the enhancers in IMR90 predicted from the individual methods are 82,392 (RFECS), 35,203 (LMethyR-SVM) and 35,203 (Random).
The summary of the overlap between the predicted enhancers and the FANTOM5 enhancers.
| Unique | Total | ||
|---|---|---|---|
| H1 | ChromHMM | 120,539(4.12%) | 146,172(4.67%) |
| LMethyR-SVM | 45,629(10.01%) | 75,947(11.79%) | |
| IMR90 | RFECS | 55,665(12.35%) | 64,765(13.78%) |
| LMethyR-SVM | 32,775(8.63%) | 51,146(14.24%) |
Unique: the number of the enhancers that were uniquely predicted by a method and validated by TPMs.
Total: the total number of the enhancers that were predicted by a method and validated by TPMs.
%FANTOM: the percentage of the enhancers that overlap with the FANTOM5 enhancers.
The comparison of the proportions of overlap between the enhancer windows predicted by MethyR-SVM in H1 and ChromHMM annotated enhancers in 9 cell types.
| Cell type | # enhancers predicted by ChromHMM | # overlaps between LMethyR-SVM and ChromHMM enhancers | Proportion of overlap |
|---|---|---|---|
| H1 | 235,178 | 38,397 | 0.392 |
| GM12878 | 236,344 | 25,934 | 0.265 |
| Hepg2 | 201,190 | 29,365 | 0.300 |
| Hmec | 283,718 | 36,805 | 0.376 |
| Hsmm | 267,423 | 30,743 | 0.314 |
| Huvec | 228,423 | 30,347 | 0.310 |
| K562 | 242,306 | 32,598 | 0.332 |
| Nhek | 272,728 | 28,908 | 0.295 |
| Nhlf | 235,293 | 27,493 | 0.280 |
The number of enhancer windows predicted by LMethyR-SVM in H1 overlapped with those predicted by ChromHMM.
Fig 4Comparison of the conservation levels for the predicted enhancers.
Proportions of overlaps between the predicted enhancers from each method with the conserved segments by the UCSC PhastCons46Ways conservation annotation at vertebrate level. Each enhancer is represented by its midpoint (1bp); (a) for H1 and (b) for IMR90.