| Literature DB >> 20823314 |
Abstract
MOTIVATION: Gene regulation commonly involves interaction among DNA, proteins and biochemical conditions. Using chromatin immunoprecipitation (ChIP) technologies, protein-DNA interactions are routinely detected in the genome scale. Computational methods that detect weak protein-binding signals and simultaneously maintain a high specificity yet remain to be challenging. An attractive approach is to incorporate biologically relevant data, such as protein co-occupancy, to improve the power of protein-binding detection. We call the additional data related with the target protein binding as supporting tracks.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20823314 PMCID: PMC2935431 DOI: 10.1093/bioinformatics/btq379
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Flow chart of the proposed method.
Fig. 2.Comparison of FDRs in 10 simulated datasets. Our method with and without using supporting tracks (prior) is controlled at 10% FDR level. We cannot specify FDR level for MPeak.
Fig. 3.(A) FDR and (B) detection power comparison of our method before (black) and after (colored) using supporting tracks at different levels of concordance. Box-plot of 10 datasets at each concordance level is shown.
Fig. 4.Comparison of peak-calling performance using six different discretization methods on supporting tracks: Equal-Width, Equal-Freq, Clustering (k = 2 ∼ 8), Entropy, Round and SmoothRound. The traditional peak-calling method without using supporting tracks (no prior) is shown as a comparison. (A) The total number of detected peaks. (B) The total number of detected true peaks with medium intensity (1.5–2.0). (C) The total number of detected peaks overlapping with previously identified ChIP-seq peaks.
Predicted GATA1-binding sites enrichment in true binding regions, RefSeq Gene and predicted peaks by Cheng et al. (2009)
| Dataset | Number of peaks | Number of high VPs | Number of medium VPs | Number of low VPs | Total VPs | Number of FP | ChIP-chip peaks | ChIP-Seq peaks | Union of peaks | RefSeq genes |
|---|---|---|---|---|---|---|---|---|---|---|
| 53 | 24 | 22 | 99 | 83 | 311 | 780 | 890 | 1176 | ||
| ChIP-chip peaks | ||||||||||
| PASS—no prior | 317 | 51 | 14 | 0 | 65 | 4 | 251 | 221 | 292 | 198/149 |
| PASS2—additional | 66 | 0 | 4 | 0 | 4 | 1 | 19 | 33 | 44 | 43/38 |
| MPeaks | 147 | 45 | 8 | 0 | 53 | 2 | 142 | 112 | 145 | 91/74 |
| TMAL (L1) | 139 | 40 | 9 | 1 | 50 | 1 | 134 | 97 | 137 | 84/41 |
| ChIP-seq peaks | ||||||||||
| PASS—no prior | 554 | 45 | 16 | 13 | 74 | 4 | 177 | 463 | 467 | 325/186 |
| PASS2—additional | 63 | 0 | 0 | 0 | 0 | 0 | 5 | 35 | 35 | 36/33 |
aComputationally identified peaks by Cheng et al. (2009).
VPs, validated peaks by q-PCR (Cheng et al., 2008; Zhang et al., 2009); RefSeq Genes, RefSeq genes from the UCSC browser in the 66 Mb region on chromosome 7 in the mouse genome (mm8). Overlapping entries are merged. The overlapping intervals between RefSeq genes and the detected peaks have P-value < 0.05 from 100 permutations.
Estimated effects and P-values of the supporting tracks
| ChIP-chip | ChIP-seq | |||
|---|---|---|---|---|
| β | β | |||
| Intercept | −1.25e+01 | 0 | −1.14e+01 | 0 |
| Open chromatin | 2.42e−02 | 6.73e−01 | 6.70e−01 | 5.26e−48 |
| H3K27me3 | −2.63e−01 | 9.78e−06 | −1.51e+00 | 1.25e−12 |
| H3K4me3 | 1.79e+00 | 1.36e−106 | 1.18e+00 | 9.45e−34 |
| TAL1 | 7.99e−01 | 2.46e−67 | 1.51e+00 | 3.26e−246 |
β is the regression coefficient in the logistic regression model.
Fig. 5.(A) Venn diagram of the ChIP-chip and ChIP-seq peaks identified with and without the supporting tracks. (B) An example of a novel GATA1-occupied segment within Tjp1 identified by our method. It was missed by previous HD2 ChIP-chip and ChIP-seq analysis (Cheng et al., 2009). This region also shows depleted H3K27me3 and enriched H3K4me3 signals. Horizontal black lines indicate signal means.