| Literature DB >> 28499349 |
Mohamed-Ashick M Saleem1, Marco-Antonio Mendoza-Parra2,3,4,5, Pierre-Etienne Cholley1, Matthias Blum1, Hinrich Gronemeyer6,7,8,9.
Abstract
BACKGROUND: Exponentially increasing numbers of NGS-based epigenomic datasets in public repositories like GEO constitute an enormous source of information that is invaluable for integrative and comparative studies of gene regulatory mechanisms. One of today's challenges for such studies is to identify functionally informative local and global patterns of chromatin states in order to describe the regulatory impact of the epigenome in normal cell physiology and in case of pathological aberrations. Critically, the most preferred Chromatin ImmunoPrecipitation-Sequencing (ChIP-Seq) is inherently prone to significant variability between assays, which poses significant challenge on comparative studies. One challenge concerns data normalization to adjust sequencing depth variation.Entities:
Keywords: ChIP-seq; Epigenome; Multi-sample analysis; Quantile-based data normalization
Mesh:
Substances:
Year: 2017 PMID: 28499349 PMCID: PMC5429578 DOI: 10.1186/s12859-017-1655-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Effects of data normalization. a Pie charts illustrating the changes in number of common and replicate-specific promoter-associated H3K4me3 peaks for HepG2 cell line datasets (GSM646364; GSM646365) before and after normalization (Blue: Rep1-specific peaks, Red: Rep2-specific peaks and Green: Peaks common between replicates). While mostly peaks overlap rate are conserved, some changes are observed post normalization in less enriched peaks, thus influencing peak calling thresholds. b An illustrative MA transformation plot shows the overall transition of RCI differences between replicates before and after normalization. The LOESS fit line (blue) shows the overall correction change after normalization. c Average RCI plots over annotated promoters (TSS with flanking regions of 1.5Kb) show that significant amplitude difference exists with peaks that are common between replicates (Blue: Rep1 and Red: Rep2). However, after normalization such amplitude differences are corrected and replicate-specific enrichments become more distinctive
Fig. 2Chromatin state analysis using ChromHMM. a Illustration of peak consistency between raw and normalized data for nine histone marks that were used for chromatin state analysis for nine different cell lines, as indicated. X and Y-axis of the plot are the percentage of peaks overlapping between normalized and raw data, respectively. The least overlapping rate was observed for the H3K27ac profile of H1 cells, where all the peaks from raw data (100%) were retained post normalization but only 25% of peaks from normalized data overlapped with raw data peaks showing that additional peaks were identified post normalization. As for the H3K27ac profile of H1 cells, the poor overlap between peaks predicted from raw and normalized profiles was generally due to either poor quality and/or low coverage. b Emission parameters of ChromHMM describing chromatin state differences between raw and normalized peaks. Though the predicted chromatin states were conserved, three significant differences in enrichment levels are highlighted as red-framed boxes. c An example region illustrating the change after normalization of chromatin state 14 in Fig. 2b, where H3K27ac peaks become prominent after normalization. d Stacked bar chart indicating the percentage of chromatin state annotations per bin that changed upon normalization. While the GM12878, NHEK and NHLF datasets show few changes after normalization, the other datasets show more than 5% changed bin annotations. e Illustration of change in chromatin state annotation for the MYO7A locus using the same dataset processed with ChromHMM; note that the MYO7A promoter was annotated ‘active’ from the raw data and changed to ‘poised’ post normalization, which correlates perfectly with the absence of gene expression [Encode data: ENCSR962TBJ]
Fig. 3Signal intensity profiles of H3K4me3, H3K27me3 and RNAPolII enrichments at the Hoxa cluster; shown is the temporal signal evolution from consecutive ChIP-seq experiments during retinoic acid-induced differentiation of F9 cells. Most of the genes in Hoxa cluster have been shown to follow collinear gene activation pattern during differentiation with gradual increase of active marks and decrease of repressive marks. However, such pattern was not apparent from the raw RCI profiles. Data normalization resulted in the expected gradual spatio-temporal decrease of the H3K27me3 profile and concomitant increase of H3K4me3 & RNA PolII intensity profiles. The bottom panels reveal ChIP-qPCR analyses for H3K27me3, thus validating the normalization. Specifically Hoxa1, Hoxa3 and Hoxa4 genes - but not the Hoxa10 gene –follows a collinear gene activation pattern, as observed in both the normalized data and the qPCR results