Li Chen1, Chi Wang1, Zhaohui S Qin2, Hao Wu1. 1. Department of Mathematics and Computer Science, Atlanta, GA 30322, USA, Department of Biostatistics and Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA and Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA. 2. Department of Mathematics and Computer Science, Atlanta, GA 30322, USA, Department of Biostatistics and Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA and Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA Department of Mathematics and Computer Science, Atlanta, GA 30322, USA, Department of Biostatistics and Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA and Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA.
Abstract
MOTIVATION: ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. RESULTS: In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. AVAILABILITY AND IMPLEMENTATION: An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html.
MOTIVATION: ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. RESULTS: In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. AVAILABILITY AND IMPLEMENTATION: An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html.
Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583
Authors: Daryl J Thomas; Kate R Rosenbloom; Hiram Clawson; Angie S Hinrichs; Heather Trumbower; Brian J Raney; Donna Karolchik; Galt P Barber; Rachel A Harte; Jennifer Hillman-Jackson; Robert M Kuhn; Brooke L Rhead; Kayla E Smith; Archana Thakkapallayil; Ann S Zweig; David Haussler; W James Kent Journal: Nucleic Acids Res Date: 2006-12-13 Impact factor: 16.971
Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583
Authors: Zhaohui Qin; Ben Li; Karen N Conneely; Hao Wu; Ming Hu; Deepak Ayyala; Yongseok Park; Victor X Jin; Fangyuan Zhang; Han Zhang; Li Li; Shili Lin Journal: Stat Biosci Date: 2016-03-07