Literature DB >> 20457667

Repitools: an R package for the analysis of enrichment-based epigenomic data.

Aaron L Statham1, Dario Strbenac, Marcel W Coolen, Clare Stirzaker, Susan J Clark, Mark D Robinson.   

Abstract

SUMMARY: Epigenetics, the study of heritable somatic phenotypic changes not related to DNA sequence, has emerged as a critical component of the landscape of gene regulation. The epigenetic layers, such as DNA methylation, histone modifications and nuclear architecture are now being extensively studied in many cell types and disease settings. Few software tools exist to summarize and interpret these datasets. We have created a toolbox of procedures to interrogate and visualize epigenomic data (both array- and sequencing-based) and make available a software package for the cross-platform R language. AVAILABILITY: The package is freely available under LGPL from the R-Forge web site (http://repitools.r-forge.r-project.org/) CONTACT: mrobinson@wehi.edu.au.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20457667      PMCID: PMC2887051          DOI: 10.1093/bioinformatics/btq247

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Epigenetics is the study of the phenotypic changes unrelated to DNA sequence. Epigenomics is the large-scale study of epigenetics, with various genome-wide assays having been introduced in the past few years and with many epigenome mapping projects on the horizon (Jones et al., 2008; Nature editorial, 2010). DNA methylation is one of the best studied epigenetic marks and can be assayed genome-wide using restriction enzyme, bisulphite or enrichment-based approaches (reviewed in Laird, 2010). Another significant class of epigenetic regulators is histone modifications, typically studied using chromatin immunoprecipitation (ChIP) in combination with microarrays (ChIP-chip) or next-generation sequencing (ChIP-seq). There are limited general tools available for the exploratory analysis and summarization of enrichment-based epigenomics data (see Table 3 of Laird, 2010). We present Repitools, a software package for the R environment that is focused on the analysis of enrichment-based epigenomic data. Examples are shown to illustrate the diversity of tools within the package; many further examples can be found in the comprehensive user's guide. The routines have been tested on Affymetrix and Nimblegen tiling microarrays and Illumina Genome Analyzer sequencing data; generic data types are used so that other platforms can be easily supported.

2 DATA SUMMARIZATION

Various procedures for visualization are available within the package. For example, enrichmentPlot displays the distribution of enrichment across the whole genome for sequencing-based experiments. cpgBoxplots and cpgDensityPlot display microarray and sequencing results, respectively, for quality assessment of DNA methylation enrichment experiments. Figure 1A illustrates the cpgDensityPlot of a successful methylated DNA enrichment experiment using MethylMiner™ (Invitrogen, Carlsbad CA, USA) where, as expected, the CpG density of the enriched DNA population is heavily skewed to the right compared to the input DNA control.
Fig. 1.

Repitools visualization examples. (A) In cpgDensityPlot, each line is a single experiment's read distribution in terms of CpG density. (B) For binPlots, the middle panel displays a heatmap of summarized signal according to 50 expression level bins (rows), organized into 100 bp locations (columns) within promoters. The left panel gives the enrichment colour scale and the right panel displays the gene expression for each bin. (C) For significancePlots, the purple and red lines illustrate the median signal for the gene sets of interest. The blue line represents median signal of all remaining genes in the genome, while the blue shading illustrates a 95% confidence interval (example data taken from Coolen et al. (2010).

Repitools visualization examples. (A) In cpgDensityPlot, each line is a single experiment's read distribution in terms of CpG density. (B) For binPlots, the middle panel displays a heatmap of summarized signal according to 50 expression level bins (rows), organized into 100 bp locations (columns) within promoters. The left panel gives the enrichment colour scale and the right panel displays the gene expression for each bin. (C) For significancePlots, the purple and red lines illustrate the median signal for the gene sets of interest. The blue line represents median signal of all remaining genes in the genome, while the blue shading illustrates a 95% confidence interval (example data taken from Coolen et al. (2010). We have provided many ways to visualize and summarize promoter-level microarray or genome-wide epigenomic data. For example, given a table of annotation, the binPlots function summarizes median signal across points of interest (e.g. transcription start sites). We routinely use binPlots as a quality control step of new ChIP experiments where there is a previously known relationship between the interrogated chromatin mark and another metric, commonly gene expression. For example, Figure 1B clearly illustrates the positive association between gene expression levels (Affymetrix Gene 1.0 ST data) and the occurrence of H3K9 acetylation in the proximity of the corresponding promoters (Affymetrix Promoter 1.0R data). The routine handles tiling array or sequencing data as inputs, can accept alternative rankings for grouping and the display can be a plot with multiple lines, a heatmap or a 3D visualization. Another useful strategy for summarizing sets of genes of interest is significancePlots. As illustrated in Figure 1C, significancePlots shows the distinct methylated DNA enrichment changes associated with genes whose expression is up- or down-regulated >2-fold between two samples, and how the profiles differ between array and high-throughput sequencing readout. For the comparison, a large number of random gene sets are taken to form the profile null distribution; median and confidence intervals are plotted. These plots show evidence that there is a clear enrichment of sequencing reads and hence, DNA methylation surrounding many genes are down-regulated in this comparison. Further data summaries are regularly added.

3 STATISTICAL PROCEDURES

The visualization procedures detailed above aggregate signal over a large number of promoters or regions of the genome. Often, it is of interest to focus on specific regions of the genome and summarize the signal observed at these regions (e.g. transcription start sites, exons, etc.). For example, an experimenter may be interested in promoter-level summaries of a particular epigenetic mark. The general purpose blocksStats procedure focuses on data for the specified genomic regions of interest. For microarray data, this involves the calculation of a probe-level score and applying a statistical test to the groups of probes within a specified distance from the region of interest. For sequencing data, we calculate statistics on aggregated read counts around the features of interest. Further details are available in the accompanying user's guide. We also have procedures for untargeted analysis of epigenomic tiling array data. The regionStats function searches for a persistent change in signal in an untargeted fashion, similar in principle to model-based analysis of tiling arrays (Johnson et al., 2006), and therefore not relying upon annotation. Analogous procedures for sequencing data are in development.

4 ACCESSORY TOOLS

The package contains a number of useful tools in the spectrum of epigenomics. For example, in the context of CpG methylation, microarray probes or sequence reads are often affected by the local CpG density of the regions being interrogated. cpgDensityCalc is a procedure to calculate local CpG density according to a previous definition (Pelizzola et al., 2008). annotationLookup provides a framework for relating annotation (e.g. transcription start sites) information to probe positions on a tiling array. multiHeatmap is a general tool for creating adjacent heatmaps using separate colour scales. Additional included tools exist to access Nimblegen array quickly (e.g. readPairFile), access features of aroma.affymetrix objects (e.g. getProbePositionsDf) and aggregate sequencing reads according to proximity to annotation (e.g. annotationCounts). We expect further tools to be added and encourage others in the epigenomic community to contribute generally useful procedures.

5 DISCUSSION

There are relatively few tools currently available for the analysis of epigenomic data. We have developed Repitools, a software package for the R environment; it contains many useful functions for quality assessment, visualization, summarization and statistical analysis of epigenomics experiments. The package makes use of aroma.affymetrix and several Bioconductor packages for various preprocessing steps (Bengtsson et al., 2008; Gentleman et al., 2004) and may require an intermediate understanding of R for some features. A comprehensive user manual is available and examples can be run using supplied data. The analysis of large Affymetrix tiling array datasets is facilitated through the memory efficiency afforded by the aroma.affymetrix package (Bengtsson et al., 2008). Funding: National Health and Medical Research Council (NH&MRC) project (427614, 481347) (M.D.R., C.S., D.S.) and Fellowship (S.J.C.), Cancer Institute NSW grants (CINSW: S.J.C., M.W.C., A.L.S.), and NBCF Program Grant (S.J.C.). Conflict of Interest: none declared.
  7 in total

1.  Consolidation of the cancer genome into domains of repressive chromatin by long-range epigenetic silencing (LRES) reduces transcriptional plasticity.

Authors:  Marcel W Coolen; Clare Stirzaker; Jenny Z Song; Aaron L Statham; Zena Kassir; Carlos S Moreno; Andrew N Young; Vijay Varma; Terence P Speed; Mark Cowley; Paul Lacaze; Warren Kaplan; Mark D Robinson; Susan J Clark
Journal:  Nat Cell Biol       Date:  2010-02-21       Impact factor: 28.824

2.  Model-based analysis of tiling-arrays for ChIP-chip.

Authors:  W Evan Johnson; Wei Li; Clifford A Meyer; Raphael Gottardo; Jason S Carroll; Myles Brown; X Shirley Liu
Journal:  Proc Natl Acad Sci U S A       Date:  2006-08-08       Impact factor: 11.205

3.  MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment.

Authors:  Mattia Pelizzola; Yasuo Koga; Alexander Eckehart Urban; Michael Krauthammer; Sherman Weissman; Ruth Halaban; Annette M Molinaro
Journal:  Genome Res       Date:  2008-09-02       Impact factor: 9.043

4.  Time for the epigenome.

Authors: 
Journal:  Nature       Date:  2010-02-04       Impact factor: 49.962

Review 5.  Principles and challenges of genomewide DNA methylation analysis.

Authors:  Peter W Laird
Journal:  Nat Rev Genet       Date:  2010-03       Impact factor: 53.242

6.  Moving AHEAD with an international human epigenome project.

Authors: 
Journal:  Nature       Date:  2008-08-07       Impact factor: 49.962

7.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

  7 in total
  34 in total

Review 1.  Statistical approaches for the analysis of DNA methylation microarray data.

Authors:  Kimberly D Siegmund
Journal:  Hum Genet       Date:  2011-04-26       Impact factor: 4.132

2.  Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation.

Authors:  Mark D Robinson; Clare Stirzaker; Aaron L Statham; Marcel W Coolen; Jenny Z Song; Shalima S Nair; Dario Strbenac; Terence P Speed; Susan J Clark
Journal:  Genome Res       Date:  2010-11-02       Impact factor: 9.043

3.  De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly.

Authors:  Aaron T L Lun; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2014-05-22       Impact factor: 16.971

4.  Genome-wide analysis distinguishes hyperglycemia regulated epigenetic signatures of primary vascular cells.

Authors:  Luciano Pirola; Aneta Balcerczyk; Richard W Tothill; Izhak Haviv; Antony Kaspi; Sebastian Lunke; Mark Ziemann; Tom Karagiannis; Stephen Tonna; Adam Kowalczyk; Bryan Beresford-Smith; Geoff Macintyre; Ma Kelong; Zhang Hongyu; Jingde Zhu; Assam El-Osta
Journal:  Genome Res       Date:  2011-09-02       Impact factor: 9.043

5.  iTagPlot: an accurate computation and interactive drawing tool for tag density plot.

Authors:  Sung-Hwan Kim; Onyeka Ezenwoye; Hwan-Gue Cho; Keith D Robertson; Jeong-Hyeon Choi
Journal:  Bioinformatics       Date:  2015-03-19       Impact factor: 6.937

Review 6.  Analysing and interpreting DNA methylation data.

Authors:  Christoph Bock
Journal:  Nat Rev Genet       Date:  2012-10       Impact factor: 53.242

7.  MethGo: a comprehensive tool for analyzing whole-genome bisulfite sequencing data.

Authors:  Wen-Wei Liao; Ming-Ren Yen; Evaline Ju; Fei-Man Hsu; Larry Lam; Pao-Yang Chen
Journal:  BMC Genomics       Date:  2015-12-09       Impact factor: 3.969

8.  Acetylation of H2A.Z is a key epigenetic modification associated with gene deregulation and epigenetic remodeling in cancer.

Authors:  Fátima Valdés-Mora; Jenny Z Song; Aaron L Statham; Dario Strbenac; Mark D Robinson; Shalima S Nair; Kate I Patterson; David J Tremethick; Clare Stirzaker; Susan J Clark
Journal:  Genome Res       Date:  2011-07-25       Impact factor: 9.043

9.  Interaction of Sox2 with RNA binding proteins in mouse embryonic stem cells.

Authors:  Paulo P Amaral; Pär G Engström; Samuel C Robson; Michael L Nielsen; Tony Kouzarides; Gonçalo Castelo-Branco
Journal:  Exp Cell Res       Date:  2019-05-09       Impact factor: 3.905

10.  Copy-number-aware differential analysis of quantitative DNA sequencing data.

Authors:  Mark D Robinson; Dario Strbenac; Clare Stirzaker; Aaron L Statham; Jenny Song; Terence P Speed; Susan J Clark
Journal:  Genome Res       Date:  2012-08-09       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.