| Literature DB >> 26619763 |
Guillaume Devailly1, Anna Mantsoki1, Tom Michoel1, Anagha Joshi1.
Abstract
Genome-wide data is accumulating in an unprecedented way in the public domain. Re-mining this data shows great potential to generate novel hypotheses. However this approach is dependent on the quality (technical and biological) of the underlying data. Here we performed a systematic analysis of chromatin immunoprecipitation (ChIP) sequencing data of transcription and epigenetic factors from the encyclopaedia of DNA elements (ENCODE) resource to demonstrate that about one third of conditions with replicates show low concordance between replicate peak lists. This serves as a case study to demonstrate a caveat concerning genome-wide analyses and highlights a need to validate the quality of each sample before performing further associative analyses.Entities:
Keywords: Chromatin immunoprecipitation sequencing; Data integration; Encyclopaedia of DNA element; Transcription factor
Mesh:
Substances:
Year: 2015 PMID: 26619763 PMCID: PMC4686001 DOI: 10.1016/j.febslet.2015.11.027
Source DB: PubMed Journal: FEBS Lett ISSN: 0014-5793 Impact factor: 4.124
Fig. 1Assessing variability across TF ChIP-seq replicate experiments. A, B and C: Examples of the classification based on peak overlap. For each panel, numbers on the y-axis represent experiment ID (full names available in Figs. S5–61A). Number on the x-axis indicates number of peaks. Black: detected peaks. Grey: no peaks were called in that region. Coverages plot: Mean coverage at detected and undetected peaks. Black lines: mean coverage at common peaks. Dash line: mean coverage at detected peaks that were not in common with other experiments. Grey line: mean coverage at undetected peaks that were called only in other experiments. FPKM: Fragments per kilo base per millions. Number of peaks barplot: Number of peaks called in each experiment of BHLHE40 ChIP-seq in untreated HepG2. Black: peaks called in every experiment. Grey: peaks not called in every experiment. Number of reads barplot: Millions of uniquely aligned reads for each experiment of BHLHE40 ChIP-seq in untreated HepG2. Common peak height boxplot: Boxplot of FPKM at common peaks for every experiment of BHLHE40 ChIP-seq in untreated HepG2. Peak distribution plot: Distance from the closest TSS was compute for every peak from both experiments of HDAC2 ChIP-seq in untreated K562. Common peaks were frequently overlapping a TSS. Peaks from experiment number 1 were generally closer to a TSS than peaks from experiment number 2. Motif logos: Motif logo of the top de novo motif discovery results from peak list of experiment number 1 (left) and experiment number 2 (right).