| Literature DB >> 27716038 |
Qian Qin1,2, Shenglin Mei1,2, Qiu Wu1,2, Hanfei Sun1,2, Lewyn Li3, Len Taing4,3, Sujun Chen1,2, Fugen Li3, Tao Liu5, Chongzhi Zang4, Han Xu4, Yiwen Chen4, Clifford A Meyer4, Yong Zhang2, Myles Brown3,6, Henry W Long7, X Shirley Liu8,9,10,11.
Abstract
BACKGROUND: Transcription factor binding, histone modification, and chromatin accessibility studies are important approaches to understanding the biology of gene regulation. ChIP-seq and DNase-seq have become the standard techniques for studying protein-DNA interactions and chromatin accessibility respectively, and comprehensive quality control (QC) and analysis tools are critical to extracting the most value from these assay types. Although many analysis and QC tools have been reported, few combine ChIP-seq and DNase-seq data analysis and quality control in a unified framework with a comprehensive and unbiased reference of data quality metrics.Entities:
Keywords: Analysis pipeline; ChIP-seq; DNase-seq; Quality atlas
Mesh:
Substances:
Year: 2016 PMID: 27716038 PMCID: PMC5048594 DOI: 10.1186/s12859-016-1274-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of ChiLin and quality report. a Arrows show the dependency relationship and order of the steps. Data processing steps are indicated in black and quality control steps are indicated in white. b batch mode processing of datasets for user to customize internal QC database, post-processing script for compiling QC database is provided at https://github.com/cfce/chilin/blob/master/demo/compile_database.sh
Fig. 2ChiLin Quality metrics exploration. a overall ChIP samples quality metrics pairwise correlation across three layers of ChiLin analysis. b ENCODE FRiP and ChiLin FRiP score comparison. c 30 samples for three degrees of ChIP-layer quality for exploring sequence depth relationship with FRiP. d FRiP score distribution along with sequence depth for the samples in c, background is the peak calling with down sampling reads, line and point colours indicates different quality level as in c. e FRiP background is the peak calling with all reads
Fig. 4ChIP layer quality metrics across eight categories. a MACS2 peak calling of All peaks number, peaks number with fold change >= 20, and >= 10 are displayed density distribution for the eight categories with threshold q value 0.01. b Overall FRiP score distributions for ChIP samples (red) and input control samples (cyan) across assay types. c Scatterplot of replicates samples wiggle correlation against peaks overlapping ratio. d Empirically cumulative distributions of the wiggle correlation and peaks overlapping ratio for the replicates consistency
Fig. 3Reads layer quality metrics across eight categories. a median sequence quality score from FASTQ files. Uniquely mapped ratio with BWA mapping quality above 1. PCR Bottleneck coefficient calculated from sampling four million reads from BAM files. b reference sequence depth suggestions for the eight categories
Fig. 5Annotation layer metrics across eight categories. Overall distribution of all peak summit overlapping percentage with exons, introns, promoters and intergenic regions for different categories, and the overall distribution of the ratio of top 5000 peaks overlapping union DHS from ENCODE