| Literature DB >> 23029045 |
Abstract
BACKGROUND: Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23029045 PMCID: PMC3461018 DOI: 10.1371/journal.pone.0045486
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1WaveSeq utilizes the continuous wavelet power spectrum to detect peaks in ChIP-Seq data.
(a) A scaled representation of the morlet wavelet. (b & c) H3K4me3 data and a contour plot of the associated wavelet power spectrum shows hot spots that correlate with ChIP enrichments. The ChIP-Seq data represents the 15,756,800–15,758,200 bp region of the mouse chromosome 1 from the MEF H3K4me3 data set. (d) A schematic of the WaveSeq analysis pipeline. The workflow consists of two major modules: (i) the Monte Carlo background estimation step and (ii) significance estimation from randomized algorithm using the peak length distribution (one-sample experiment) or an exact binomial test (two-sample experiment).
Figure 2WaveSeq has high sensitivity and precision for punctate data sets.
(a & b) Plots of peak ranks against the fraction of validated sites detected by WaveSeq, FindPeaks, MACS and SiSSRs for the (a) GABP and (b) NRSF data sets. WaveSeq has the highest sensitivity for the GABP data set closely followed by MACS, while all methods performed comparably for the NRSF data. (c) A plot of the fraction of true positives (Precision) against the fraction of recovered peaks (recall) for the synthetic spike-in data set shows MACS has the best combination of the two, closely followed by WaveSeq. FindPeaks calls a large number of false positives while SiSSRs fails to detect any peaks. (d & e) Sensitivity plots for the (d) GABP and (e) NRSF data sets shows that WaveSeq has high sensitivity for these data sets even in the absence of control. FindPeaks performs much better on these data sets without control and has almost identical sensitivity as WaveSeq. SiSSRs has mixed results with low sensitivity for GABP and high for NRSF while the reverse is true of MACS.
Figure 3WaveSeq improves detection of histone modification peaks.
(a, b & c) Plots of peak ranks against the fraction of putative ‘true positive’ sites detected by WaveSeq, SICER, RSEG and MACS for the (a) H3K4me3, (b) H3K36me3 and (c) H3K27me3 data sets. All methods apart from RSEG perform comparably on the punctate H3K4me3 data. However, WaveSeq outperforms the other methods on the broader peaks of H3K36me3 and H3K27me3. SICER comes in second while MACS has low sensitivity for diffuse data. RSEG has good sensitivity for the strongest peaks but suffers from low recall, failing to detect any peaks in chromosomes 10–19. (d) A plot of the fraction of true positives (precision) from the top 10000 peaks detected by the above four methods in the MEF histone modification data sets shows that WaveSeq has the best performance, closely followed by SICER. MACS performs well only on the H3K4me3 data while RSEG has low precision for all the three data sets.
Functional annotation of genes having H3K4me3 DMRs.
| Gene Ontology Term | Count | p-value | FDR (%) |
| GO:0002520: Immune system development | 15 | 1.91×10−8 | 3.02×10−5 |
| GO:0030097: Hemopoiesis | 14 | 2.16×10−8 | 3.41×10−5 |
| GO:0048534: Hemopoietic or lymphoid organ development | 14 | 8.76×10−8 | 1.38×10−4 |
| GO:0045580: Regulation of T cell differentiation | 7 | 8.60×10−7 | 0.001359 |
| GO:0002521: Leukocyte differentiation | 10 | 1.11×10−6 | 0.001747 |
| GO:0045582: Positive regulation of T cell differentiation | 6 | 1.23×10−6 | 0.001951 |
| GO:0045321: Leukocyte activation | 11 | 1.70×10−6 | 0.002693 |
| GO:0045619: Regulation of lymphocyte differentiation | 7 | 2.39×10−6 | 0.003781 |
| GO:0002684: Positive regulation of immune system process | 10 | 2.73×10−6 | 0.004309 |
| GO:0045621: Positive regulation of lymphocyte differentiation | 6 | 3.33×10−6 | 0.005262 |
| GO:0046649: Lymphocyte activation | 10 | 4.70×10−6 | 0.007428 |
| GO:0050870: Positive regulation of T cell activation | 8 | 5.53×10−6 | 0.008734 |
| GO:0001775: Cell activation | 11 | 6.17×10−6 | 0.009752 |
| GO:0051251: Positive regulation of lymphocyte activation | 8 | 8.08×10−6 | 0.012774 |
| GO:0002696: Positive regulation of leukocyte activation | 8 | 1.16×10−5 | 0.018257 |
| GO:0050867: Positive regulation of cell activation | 8 | 1.62×10−5 | 0.025558 |
| GO:0050863: Regulation of T cell activation | 8 | 1.62×10−5 | 0.025558 |
| GO:0030098: Lymphocyte differentiation | 8 | 1.90×10−5 | 0.030027 |
| GO:0051249: Regulation of lymphocyte activation | 8 | 2.59×10−5 | 0.040908 |
| GO:0030217: T cell differentiation | 7 | 2.76×10−5 | 0.04356 |
| GO:0002694: Regulation of leukocyte activation | 8 | 4.00×10−5 | 0.063158 |
| GO:0045058: T cell selection | 5 | 6.65×10−5 | 0.105094 |
| GO:0050865: Regulation of cell activation | 8 | 6.80×10−5 | 0.107401 |
| GO:0002252: Immune effector process | 6 | 1.38×10−4 | 0.218176 |
| GO:0033077: T cell differentiation in the thymus | 5 | 2.30×10−4 | 0.362727 |
| GO:0042110: T cell activation | 7 | 2.43×10−4 | 0.38295 |
| GO:0042981: Regulation of apoptosis | 14 | 2.47×10−4 | 0.389793 |
| GO:0043067: Regulation of programmed cell death | 14 | 2.98×10−4 | 0.469488 |
| GO:0010941: Regulation of cell death | 14 | 3.12×10−4 | 0.491456 |
| GO:0033554: Cellular response to stress | 12 | 4.00×10−4 | 0.629557 |
| GO:0045061: Thymic T cell selection | 4 | 4.06×10−4 | 0.639966 |
The top functional categories (FDR <1%) enriched among genes having H3K4me3 DMRs from DAVID shows a large number of immune-related functions. Count refers to the number of genes in the gene list annotated with the given GO ID. P-values were obtained from a modified Fisher exact test performed by DAVID which tests the enrichment of the corresponding functional category in the given gene list against the population (chicken genome). FDR correction was performed using the Benjamini-Hochberg procedure [7].
Figure 4Differentially marked regions detected by WaveSeq suggest increased B cell activation in susceptible chickens.
Several genes involved in the B cell activation such as LYN (a), SYK (b) and RAC2 (c) show increased levels of H3K4me3 in infected birds from the S group as shown by the arrowheads. In contrast, there are no significant changes in the R group. This suggests the presence of increased numbers of activated B cells in susceptible birds that may lead to increased viral loads in latter stages of MD. *** = p<0.001; * = p<0.05. S.inf = infected S group, S.ctl = control S group, R.inf = infected R group, R.ctl = control R group.