| Literature DB >> 35664802 |
Maëlle Daunesse1, Rachel Legendre1, Hugo Varet1, Adrien Pain1, Claudia Chica1.
Abstract
We present ePeak, a Snakemake-based pipeline for the identification and quantification of reproducible peaks from raw ChIP-seq, CUT&RUN and CUT&Tag epigenomic profiling techniques. It also includes a statistical module to perform tailored differential marking and binding analysis with state of the art methods. ePeak streamlines critical steps like the quality assessment of the immunoprecipitation, spike-in calibration and the selection of reproducible peaks between replicates for both narrow and broad peaks. It generates complete reports for data quality control assessment and optimal interpretation of the results. We advocate for a differential analysis that accounts for the biological dynamics of each chromatin factor. Thus, ePeak provides linear and nonlinear methods for normalisation as well as conservative and stringent models for variance estimation and significance testing of the observed marking/binding differences. Using a published ChIP-seq dataset, we show that distinct populations of differentially marked/bound peaks can be identified. We study their dynamics in terms of read coverage and summit position, as well as the expression of the neighbouring genes. We propose that ePeak can be used to measure the richness of the epigenomic landscape underlying a biological process by identifying diverse regulatory regimes.Entities:
Year: 2022 PMID: 35664802 PMCID: PMC9154330 DOI: 10.1093/nargab/lqac041
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.The ePeak workflow. (A) Five ePeak modules executing specific and interdependent tasks. Stop signs indicate where the analysis ends, depending on the data provided by the user. Stop 1 if no replicates are available. Stop 2 for datasets with replicates for only one condition. Stop 3 when replicates are available for two or more conditions. (B) ePeak Snakemake rule graph illustrating the input/output dependencies between steps. Border colour indicates the module membership of each rule (SPR: self pseudo-replicate, PPR: pooled pseudo-replicate).
Figure 2.ChIP-seq position variability. Summit instability is defined as the distance between the summit of the peak called in each replicate and the corresponding reproducible peak. Tracks show the IP coverage, peak and summit position for the two replicates separately (top) and pooled (bottom) in two genomic regions. Vertical blue lines indicate the position of the reproducible peak summit and horizontal black lines the distance to the corresponding summit in each replicate.
Figure 3.Comparison of the statistical settings for differential analysis. (A) Kernel density estimation of read coverage and summit instability for all reproducible peaks across replicates of the two biological conditions under study. (B) Read coverage and summit position variability estimation using the Fquantro statistic (16). (C) Proportion of total differentially marked/bound peaks obtained using each statistical setting. DESeq2 = DESeq2 with geometric mean normalisation; NL-L = limma with nonlinear and with linear normalisation; NL = limma with nonlinear normalisation only; L = limma with linear normalisation only. (D, E) Quantitative characterisation of differentially marked/bound peak populations obtained using each statistical setting. Distribution of ChIP-seq read counts dispersion as estimated by limma (D). Distribution of ChIP-seq absolute changes in read counts between shUbc9 and shControl (E). Colours correspond to panel C. (F) Expression dynamics of genes neighbouring differentially marked/bound peak populations. Distribution absolute changes in RNA-seq read counts between shUbc9 and shControl. Colours correspond to panel (C).