| Literature DB >> 24407222 |
Michael T McCarthy1, Christopher A O'Callaghan.
Abstract
Hypersensitivity to DNaseI digestion is a hallmark of open chromatin, and DNaseI-seq allows the genome-wide identification of regions of open chromatin. Interpreting these data is challenging, largely because of inherent variation in signal-to-noise ratio between datasets. We have developed PeaKDEck, a peak calling program that distinguishes signal from noise by randomly sampling read densities and using kernel density estimation to generate a dataset-specific probability distribution of random background signal. PeaKDEck uses this probability distribution to select an appropriate read density threshold for peak calling in each dataset. We benchmark PeaKDEck using published ENCODE DNaseI-seq data and other peak calling programs, and demonstrate superior performance in low signal-to-noise ratio datasets.Entities:
Mesh:
Year: 2014 PMID: 24407222 PMCID: PMC3998130 DOI: 10.1093/bioinformatics/btt774
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) PeaKDEck uses a sampling bin to measure signal at any given locus. PeaKDEck calculates the corrected read density by first counting the number of read start sites (green) within a central bin (e.g. Five read start sites in a bin of size 300 bp). Next, the read density in a larger background bin is measured (e.g. 10 reads in a bin of size 3000 bp). Based on this background read density, the expected read density in a bin of central bin size is calculated (e.g. 10 reads in 3000 bp, giving an expected read density of 1 read in 300 bp) and subtracted from the central bin read density to give the corrected read density (4 in this example). (B) We calculated the percentage of unique sites identified by four different peak callers in each of 10 sample datasets, and color-coded each dataset based on the SNR from blue (low SNR) to red (high SNR). For datasets with low SNR, PeaKDEck had the lowest percentage of unique peaks out of the four peak callers