An Zheng1, Michael Lamkin2, Yutong Qiu1,3, Kevin Ren4, Alon Goren5, Melissa Gymrek6,7. 1. Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. 2. Department of Bioengineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. 3. School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA. 4. Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA. 5. Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. agoren@ucsd.edu. 6. Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. mgymrek@ucsd.edu. 7. Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. mgymrek@ucsd.edu.
Abstract
BACKGROUND: A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. RESULTS: We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips . CONCLUSIONS: ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.
BACKGROUND: A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. RESULTS: We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips . CONCLUSIONS: ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.
Authors: Lance D Hentges; Martin J Sergeant; Christopher B Cole; Damien J Downes; Jim R Hughes; Stephen Taylor Journal: Bioinformatics Date: 2022-07-22 Impact factor: 6.931