Qi Zhang1, Sündüz Keleş2. 1. Department of Biostatistics and Medical Informatics, 425 Henry Mall and Department of Statistics, 1300 University Avenue, Madison, WI 53706, USA. 2. Department of Biostatistics and Medical Informatics, 425 Henry Mall and Department of Statistics, 1300 University Avenue, Madison, WI 53706, USA Department of Biostatistics and Medical Informatics, 425 Henry Mall and Department of Statistics, 1300 University Avenue, Madison, WI 53706, USA.
Abstract
MOTIVATION: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads. RESULTS: We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets. AVAILABILITY AND IMPLEMENTATION: Available at http://www.stat.wisc.edu/∼qizhang/ CONTACT: : qizhang@stat.wisc.edu or keles@stat.wisc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads. RESULTS: We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets. AVAILABILITY AND IMPLEMENTATION: Available at http://www.stat.wisc.edu/∼qizhang/ CONTACT: : qizhang@stat.wisc.edu or keles@stat.wisc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Jeffrey A Bailey; Zhiping Gu; Royden A Clark; Knut Reinert; Rhea V Samonte; Stuart Schwartz; Mark D Adams; Eugene W Myers; Peter W Li; Evan E Eichler Journal: Science Date: 2002-08-09 Impact factor: 47.728
Authors: Timothy L Bailey; Mikael Boden; Fabian A Buske; Martin Frith; Charles E Grant; Luca Clementi; Jingyuan Ren; Wilfred W Li; William S Noble Journal: Nucleic Acids Res Date: 2009-05-20 Impact factor: 16.971
Authors: Jay R Hesselberth; Xiaoyu Chen; Zhihong Zhang; Peter J Sabo; Richard Sandstrom; Alex P Reynolds; Robert E Thurman; Shane Neph; Michael S Kuehn; William S Noble; Stanley Fields; John A Stamatoyannopoulos Journal: Nat Methods Date: 2009-03-22 Impact factor: 28.547
Authors: David R Kelley; Yakir A Reshef; Maxwell Bileschi; David Belanger; Cory Y McLean; Jasper Snoek Journal: Genome Res Date: 2018-03-27 Impact factor: 9.043