Georg Stricker1,2, Alexander Engelhardt1, Daniel Schulz1, Matthias Schmid3, Achim Tresch4, Julien Gagneur1,2. 1. Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, 80333 Munich, Germany. 2. Department of Informatics, Technische Universität München, 85748 Garching, Germany. 3. Institut für Medizinische Biometrie, Informatik und Epidemiologie, University Hospital Bonn, 53105 Bonn, Germany. 4. Institute for Genetics, University of Cologne, 50647 Cologne, Germany.
Abstract
MOTIVATION: Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective. RESULTS: Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays. AVAILABILITY AND IMPLEMENTATION: Software is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/GenoGAM.html . CONTACT: gagneur@in.tum.de. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.
MOTIVATION: Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective. RESULTS: Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays. AVAILABILITY AND IMPLEMENTATION: Software is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/GenoGAM.html . CONTACT: gagneur@in.tum.de. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.