| Literature DB >> 19094206 |
Kyoung-Jae Won1, Iouri Chepelev, Bing Ren, Wei Wang.
Abstract
BACKGROUND: Recent genomic scale survey of epigenetic states in the mammalian genomes has shown that promoters and enhancers are correlated with distinct chromatin signatures, providing a pragmatic way for systematic mapping of these regulatory elements in the genome. With rapid accumulation of chromatin modification profiles in the genome of various organisms and cell types, this chromatin based approach promises to uncover many new regulatory elements, but computational methods to effectively extract information from these datasets are still limited.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19094206 PMCID: PMC2657164 DOI: 10.1186/1471-2105-9-547
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Histone modification patterns of promoters and enhancers in untreated HeLa cells. This figure is re-generated from Heintzman et al [15]. All signals of six histone marks are drawn centered on TSSs and p300 binding peaks. Average signal of histone marks of TSS and enhancers are drawn in black.
Figure 2Examples of histone modification patterns in promoters and enhancers. (A) Promoter prediction using chromatin signature. TSS near chr1:148185131 shows a typical histone modification pattern for promoter while H3K4me3 has a relatively weak signal for the TSS near chr1:148158254. The predictions made by the profile based method of Heintzman et al. are labeled in green and the predictions made by the HMM developed in this study are in red. (B) Enhancer prediction using chromatin signature. A p300 binding site is shown at chr6:132486009 and overlaps with a DHS site, which is a strong evidence to support an enhancer site. (C) Enhancer prediction using chromatin signature. A DHS site near chr8:119170000 overlaps with a weak H3K4me3 signal but is not found as an enhancer by the profile-based method.
The results of 100 HMM-SA runs.
| Combination | ||
| H4ac, H3ac, H3Kme1, H3K4me2, H3K4me3, H3 | 43 | 98.8%/94.5% |
| H3Kme1, H3K4me2, H3K4me3, H3 | 8 | 99.1%/93.2% |
| H4ac, H3ac, H3Kme1, H3K4me2, H3 | 6 | 99.1%/94.1% |
| H3Kme1, H3K4me2 | 6 | 99.7%/92.8% |
| H4ac, H3Kme1, H3K4me2, H3K4me3, | 5 | 100%/93.5% |
| H3Kme1, H3K4me2, H3K4me3, H3 | 5 | 100%/93.0% |
| H3ac, H3Kme1, H3K4me2, H3K4me3, H3 | 5 | 99.2%/94.6% |
| H3Kme1, H3K4me2, H3K4me3 | 5 | 99.6%/94.6% |
| H4ac, H3Kme1, H3K4me2, H3 | 4 | 100.0%/93.2% |
| H4ac, H3ac, H3Kme1, H3K4me2, H3K4me3 | 3 | 98.1%/94.6% |
| H4ac, H3Kme1, H3K4me2 | 3 | 100%/94.6% |
| H4ac, H3ac, H3Kme1, H3K4me3, H3 | 2 | 97.2%/93.2% |
| H3Kme1, H3K4me2, H3 | 2 | 100.0%/93.2% |
| H4ac, H3K4me3 | 1 | 96.2%/91.9% |
| H3ac, H3Kme1, H3K4me2 | 1 | 98.1%/94.6% |
| H3ac, H3Kme1, H3K4me2, H3 | 1 | 100.0%/94.6% |
| Window Size | Number of times used | Prediction rate (promoter/enhancer) |
| 1 K | 8 | 99.3%/93.6% |
| 2 K | 75 | 99.0%/94.3% |
| 4 K | 7 | 99.7%/93.4% |
| 8 K | 7 | 99.5%/93.1% |
| 10 K | 2 | 100%/91.9% |
| 12 K | 1 | 98.1%/89.2% |
Occurrence of each histone modification in the most informative combinations found by the 100 HMM-SA runs.
| H4ac | H3ac | H3K4me1 | H3K4me2 | H3K4me3 | H3 |
| 75 | 61 | 99 | 97 | 77 | 76 |
Comparison of cross-validation results for predicting promoters and enhancers.
| Combination | Promoter PPVa (standard deviation) | Enhancer PPV (standard deviation) |
| HMM method using 6 histone signaturesb | 97.87% (1.06) | 93.52%(1.83) |
| HMM method using 2 histone signaturesc | 95.46% (2.82) | 94.06% (0.89) |
| Heintzman | 96% | 78% |
| Heintzman | 95% | 85% |
aPositive predictive value (PPV) = true prediction/(true prediction + wrong prediction) and standard deviation are calculated.
b6 histone signatures: H4ac, H3ac, H3Kme1, H3K4me2, H3K4me3, H3
c2 histone signatures: H3Kme1, H3K4me3
Figure 3True positives (TPs) versus the total number of promoter predictions (a) in the untreated and (b) in the treated HeLa cells. The TF was calculated at different cutoff values of the log-odds (see Methods). Ideal predictors are shown with a black line.
Comparison of PPV = TP/(TP+FP) in promoter predictions using the annotated TSS sites.
| Untreated | ||||
| Total Prediction (TP+FP) | TP | PPV | p-value | |
| Heintzman | 198 | 181 | 91.41% | < 1.0 × 10-16 |
| HMM( | 198 | 189 | 95.45% | < 1.0 × 10-16 |
| HMM ( | 256 | 234 | 91.41% | < 1.0 × 10-16 |
| HMM ( | 337 | 264 | 78.34% | < 1.0 × 10-16 |
| Treated | ||||
| Total Prediction (TP+FP) | TP | PPV | p-value | |
| Heintzman | 207 | 183 | 88.41% | < 1.0 × 10-16 |
| HMM ( | 207 | 196 | 94.69% | < 1.0 × 10-16 |
| HMM ( | 279 | 247 | 88.50% | 2.8 × 10-3 |
| HMM ( | 362 | 278 | 76.80% | < 1.0 × 10-16 |
We calculated p value by generating random predictions on the ENCODE regions.
Figure 4Promoter predictions supported by CAGE tags. We compared the number of predicted promoters supported by CAGE tags when changing the minimum number of CAGE tags found within 2.5 kb from the predicted TSSs. We compared the results when the HMM method made same number of predictions as the profile-based method (untreated cells: 198 predicted sites, treated cells: 208 predicted sites).
Comparison of active promoter predictions.
| untreated cella | ||||
| Active promoters | Inactive promoters | |||
| Total Prediction | Expression Supported Prediction | PPV | ||
| Heintzman | 197 | 127 | 64.47% | 31 |
| 229 | 127 | 55.46% | 32 | |
| HMM ( | 197 | 128 | 64.97% | 25 |
| HMM ( | 229 | 135 | 58.95% | 31 |
| HMM ( | 309 | 143 | 46.28% | 40 |
| treated cella | ||||
| Active promoters | Inactive promoters | |||
| Total Prediction | Expression Supported Prediction | PPV | ||
| Heintzman | 204 | 128 | 62.75% | 23 |
| 213 | 128 | 60.09% | 23 | |
| HMM ( | 204 | 128 | 62.75% | 19 |
| HMM ( | 247 | 139 | 56.27% | 22 |
| HMM ( | 328 | 145 | 44.21% | 30 |
aThe total numbers of predictions in Table 5 are slightly different from Table 4 because when multiple predicted sites were supported by the same TSS or any enhancer evidence, we merged these predictions (see Methods).
bThe number of correctly predicted active promoters did not change using a lower cut-off in the profile-based method.
Figure 5ROC curves of the HMM and profile-based methods in the untreated and treated cells.
Comparison of enhancer predictions in the untreated Hela cells.
| Heintzman | HMM method total 389 prediction | |
| distal p300 (n = 94) | 77 (sensitivity = 81.91%) | 82 (sensitivity = 87.23%) |
| distal DHS (n = 587) | 165 (sensitivity = 28.11%) | 179 (sensitivity = 30.49%) |
| Distal TRAP220 (n = 77) | 43 (sensitivity = 55.84%) | 47 (sensitivity = 61.04%) |
| Any of distal (DHS, p300, TRAP220) | 206 (PPV = 52.96%) | 213 (PPV = 54.76%) |
Sensitivity = (TP/(TP+FN)) or PPV = (TP/(TP+FP)) was calculated.
Figure 6Comparison of (a) promoter and (b) enhancer prediction. The prediction results using 10 histone marks are compared with those using 6 histone marks.
Predicted active and inactive promoters in the mouse genome.
| Active Promotera | |||||
| Cell lines | Active Gene | Total Prediction | Refseq Supported PPV | Predicted promoters not present in the expression measurement | Expression Supported PPV |
| ES | 7887 | 13853 | 81.4% | 7191 | 88.6% |
| MEF | 8092 | 11913 | 88.1% | 5480 | 92.3% |
| NPC | 7413 | 12700 | 84.1% | 6259 | 89.0% |
| Inactive Promoterb | |||||
| Cell lines | Inactive Gene | Total Prediction | Refseq Supported PPV | Predicted promoters not present in the expression measurement | Expression Supported PPV |
| ES | 4753 | 2862 | 77.0% | 1806 | 79.2% |
| MEF | 4248 | 4301 | 66.1% | 3061 | 74.6% |
| NPC | 5259 | 422 | 73.2% | 267 | 94.8% |
a Active promoter supported by gene expression and Refseq: TP is the number of active genes prediction as active and FP is the number of inactive genes predicted as active. b Inactive promoter supported by gene expression: TP is the number of inactive genes predicted as inactive and FP is the number of active genes predicted as inactive. In both cases, we only considered predictions located within 2.5 Kb to annotated genes and the total number of predictions is thus usually larger than the sum of TPs and FPs. Refseq supported PPV shows how much percent of the total active/inactive promoter predictions are supported by Refseq.
Figure 7The HMM Classifier. (a) A left-right HMM with Q states. Each state has a transition to itself and outgoing transitions toward higher states behind. Once a state is left it never comes back in a left-right model. (b) Three HMMs are trained separately for promoter, enhancer and background. Log-odds are calculated to classify a genomic region (see Methods).